Latent Semantic Indexing

Latent semantic analysis, or latent semantic indexing as it is more popularly known now, was first patented in 1988 by a number of scientists, including Susan Dumais, a Principal Researcher in the Adaptive Systems & Interaction Group of Microsoft Research, and is defined as a method by which the relationships between a set of documents and the terms they contain are analysed by producing a set of concepts related to both the terms and the documents. This concept can be applied:

  • To find relationships between terms (synonymy1 and polysemy2)
  • To compare the documents contained within an conceptual space (document classification3)
  • To translate a given query of terms into a conceptual space and find matching documents (information retrieval4)

It is the the use of LSA/I in information retrieval that really interests us here at SEO central, or wherever you may be. The fact that Google, and quite probably, but not to the same extent, the other major search engines take LSA/I into account when their algorithm performs a search can be the difference between your site ranking 1st or 101st for your specifically targeted keyword. The algorithms “know” that a page written on “baking” should naturally contain information on flour, eggs, sugar, cakes, bread, muffins, cookies, etc, etc… If you have merely blasted your page full of “baking” references, and not taken the other subjects that should at least be touched upon, then your page will be classed as overly optimised and won’t rank at all.

What this means for bloggers like you and I, is that the LSA/I segments or even subroutines of the search algorithms favour naturally written articles, that read well, and contain other relevant references, and seriously frown upon overly dense, over optimised, keyword-stuffed paragraphs that don’t read well at all.

1The synonymic relationship between words. Synonymy is not always transitive, where a word can be a synonym of several other words that have completely different definitions. For example hot is a synonym of spicy and warm.
2Similarly, a polyseme is a word that has by definition more than one meaning. For example a mole is small borrowing animal, a skin marking, a spy or informant, the SI base unit for measuring the amount of any substance or even a large structure which is normally fabricated from stone, used as a pier or junction across water.
3The method or technique by which an electronic document is classified into one or more categories based upon the content of the document.
4The method by which a document is obtained or retrieved dependant upon it’s content in relation to a query.

Originally posted on this blog in July 2007, but was lost during the great blog explosion of 2008 and retrieved from Archive.Org Don’t ask why, but I’ve always missed this post and I’m glad to have it back! :)