What is term frequency?
Term frequency refers to the number of times a term or word is found in a text or document. In information retrieval, it is one of the first methods used for finding relevant pieces of information from a larger collection of documents. For example, if you’re looking for a document relevant to search engine optimization, it would be reasonable to assume that the pages that contain these words, or those that contain the words most often, are more relevant to what you’re looking for than documents that do not contain it.
Since its first use, many different variations of term frequency have been used. At first, term frequency simply used the number of times a word was featured in a document, without taking the length of the document into account. Later, new variations were added, varying from variations that are adjusted to the document length (compared to the total number of words), to methods that compare the word’s use to the most common words in the document.
How does term frequency work?
Although many variations of how term frequency is used today exist, it always revolves around the number of times a word can be found in a document. Term frequency can be an effective way of filtering out documents or pages that aren’t relevant due to them not containing the term(s). It can also be a good way to initially order the pages based on relevance when comparing the number of times a word is mentioned on the page. However, you’ll often see more complex variations of term frequency being mixed with other factors in algorithms made for sorting documents/pages.
What can term frequency be used for?
One of the ways that term frequency is used is to help computer programs gauge relevancy. These sorting programs, also called sorting algorithms, work to remove all irrelevant documents/pages and order the rest based on relevance. In the early days, simple sorting algorithms based on term frequency were used by Google and other search engines for sorting search results. This often led to people hiding extra keywords on their page to make the page seem more relevant. Nowadays, sorting algorithms are more complex, with hundreds of different factors and more intricate ways of determining relevance.
Term frequency is also an important part of TF-IDF - a method used to find the relevance of certain words in a document.
Term frequency and TF-IDF
Term Frequency – Inverse Document Frequency, also called TF-IDF, is a method for determining the relevance of a word in a document. TF-IDF combines term frequency with inverse document frequency to gauge the relevance of a word in a document, compared to all the other documents in the collection. Each word is assigned a value based on inverse document frequency, which looks at how often a word appears in the other documents. This indicates how unique a term is, which can be used to determine how much information a term provides.
Terms like “the” and “a” are seen many times throughout all documents, whereas more meaningful words like, for example, “SEO” or “search engine” aren’t seen as often and therefore given a higher value. The value then increases depending on the term frequency. TF-IDF has shown to be very effective at filtering stop words as well as words used very often in specific industries.
Screenshot of Seobility’s TF*IDF tool that uses TF-IDF to optimize content for SEO.
Relevance to SEO
Term frequency continues to be an important part of SEO. Although Google and other search engines have long moved past solely using term frequency for gauging relevance, ensuring your content contains enough relevant words still appears to be an important part of optimizing content for search engines. Term Frequency can also be used to identify important terms on competing pages. This helps you in finding which topics you need to cover and which keywords may offer SEO opportunities.