More LSI amusement

I wanted to get your opinion on how the Google algorithm actually works according to Leslie Rohde from “StomperNet”.  He posted 2 videos, one about LSI and one about “Referential integrity”.

“Warning – “Advanced” SEO Technique DOES NOT WORK” video:

I can see a whole lot of confusion in this.  The main one being that LSI has more than moved on since the 80’s, Plsi has been favoured for some timemini rdf More LSI amusement as well as a number of variants (Latent Dirichlet Allocation for example).  LSI in it’s wikipedia explained formmini rdf More LSI amusement is…well…very basic.  I have always said that LSI wasn’t used in that way in the Google search engine.  LSI (and its variants) is a really useful method used in many systems, not just search engines.  It is also usually just one cog in the machine.

I wrote about the SEO-LSI phenomenonmini rdf More LSI amusement and also about how Google clearly do value methods such as Plsimini rdf More LSI amusement.

My analogy for it has been that NASA use glue in their rockets, but it’s not the same glue as you find in your child’s pencil case.

The other big problem is that the entire concept in this video is flawed because it assumes that Google is built on LSI alone.  The tests done to prove that Google doesn’t use LSI are completely flawed.  This not how the technique works as you all well know.  It’s a cog in the machine and isolating it like this achieves nothing but sensational banter.

The statements about “years of AI” not solving the problems in word disambiguation and context extraction is unfair because for a start that’s in the domain of natural language processing, cognitive linguistics, computational linguistics and so on.  It’s also extremely difficult.  I’m not at all sure that Mr Rohde  truly understands that.

He also says that LSI “claims” to handle synonyms and word forms with math – Doesn’t it?  It’s a statistical method after all so it would hardly come as a shock.

Saying “LSI is lame” is a rather silly statement imo.  There’s really nothing else to say about that.

I will leave it to you to watch the video if you likemini rdf More LSI amusement and make your own mind up about it.

“Referential Integrity Rocks!” video:

This is a database concept as many of you will know.  For those who don’t, it’s a method that makes sure that  relationships between tables in a database remain consistent.  It makes sure that when an update is made to one entry in the database, this change is reflected in any other entries that are affected by that update.  Database methods whilst useful are not the bread and butter of search engines.

This video is the follow up to the last, and it sets out to demonstrate to us that “referential integrity” is in fact the bread and butter of the Google search algorithm if you will.

In the last video I was amused and happy to pick out some bits and pieces.  In this one, I am completely confused.  Let me know how you get on.

I gather that apparently Google uses “social LSI” which is totally new concept for me.  I couldn’t find any background on it either (not even in Googlemini rdf More LSI amusement), so please volunteer some if you can.

Anchor text used to find what docs relate to each other.  There is a word index and a link index. The query is matched against both tables and both indices are combined. This seems to be the main method.

Again a bold statement saying that Google succeeds where LSI has failed – one is a search engine and the other a commonly used method in information retrieval. In fact it’s not a lot of code eithermini rdf More LSI amusement.

It goes on to talk about topic detection using on-page factors. Now I’ve done a bit of work in the area of topic detection and I can tell you that it is so enormously complicated and requires knowledge from several different areas of computing to even attempt. This explanation appears overly simplistic.

Reputation is assessed using link text and links, which is nothing new.  Again this is pretty simplistic imo.

There’s a confirmation stage as well, which serves as a kind of check on the other 2 stages, maybe it applies constraints or something, I don’t know.

The rest of the video seems to reiterate seo methods used currently by many practitioners.  I will leave it to you to check it out if you likemini rdf More LSI amusement and learn about all the detail.

So…

I would be interested to hear about what you have to say about this information. You are all used to peer reviewing no doubt and also used to pulling apart algorithms.  Feel free to share thoughts, material, reactions…

The original papers for LSI by Susan Dumais et al.mini rdf More LSI amusement can be found at the Telecordia site.  I also wrote a presentation all about the search engine indexmini rdf More LSI amusement which you might find handy.

Here is a lecture by professor Andrew Ng from Stanford University on Machine Learning (he also introduces LSI):

Post to Twitter Tweet This Postmini rdf More LSI amusement

Related Posts:


10 Comments Add Yours ↓

  1. 1

    GREAT POST. I’ve never even been on here before. I agree that Stompernet’s latest video’s are using the strawman technique to stir controversy, their intent is certainly not to enlighten, clarify the debate, as they delete people’s comments. The internet is about dialogue not monologue, you’d think Stompernet would practice what they preach

  2. 2

    Hi CJ,

    Watched those videos about LSI and Google. In the first video the SEO guy explains how LSI works and tries prove that Google is not using simple LSI as he described. In the second video he introduces a new term “Social LSI” that I haven’t heard it before. I’m sure Google is not using classical LSI approach. Probably, they have conceptual and meaningful approaches instead of words or links otherwise; it’d be easy to implement a search engine.

    Let’s see what will happen… If he is right or an ex google employee, there would be a discussion about it.

  3. CJ #
    3

    Thank you Rex – I agree that not posting people’s comments isn’t very useful for anyone and doesn’t do much for the credibility of the piece. Mehmet, yes me neither. “Social LSI” is not something anyone I know has come across, I’d be glad to hear more if you find anything. (The information is not put together by an ex-googler.)

  4. Christopher Rines #
    4

    From the content of the videos I have to say Leslie Rohde really doesn’t have any idea what he is talking about. Sorry Mr. Rohde but there it is…

    Whatever form of Semantic Analysis (SA) Google is using & you are can be sure they are using some type of SA algorithms it’s way beyond LSI as described by Mr. Rohde. I actually find it humorous (at best) that Mr. Rohde fixates on LSI which is the begining of the SA story. Since LSI was developed many other forms of SA have become prominent. One must remember like ANY other underlying algorithm whatever from of SA they are using simply a tool in a very large box.

    I’d love to hear him explain in detail how LSI “claims” to handle synonyms and word forms with math and then explain how it does not. ;)

    I think one of his issues is he is focusing on 1 algorithm and not taking into account all the other pieces that surround it to maximize its effectiveness. Nothing in Information Retrieval & Information Extraction lives in a bubble, there is no “magic” bullet that one plugs in and bang you are replicating Google.

    Finally I do not have a crystal ball into Google’s complete stack & unless I’m really mistaken neither does Mr. Rohde, so talking as an authority on the how & what algorithmically they are doing is silly. This is highlighted by his use of the “Social LSI” concept, while it sounds neat it’s something I have never heard of or can find any reference to. As a practitioner of Statistical NLP which LSI and like technologies are a part of I am horrified by these 2 videos.

  5. 5

    Well you can’t get your serious info from a site that has “stomper” in it’s name. However they are clearly using their brand of “social LSI” to get the links and attention they think their reputation deserves. Sometimes the real lesson is in between the LSI and the BSI. Of course in the end it’s not rankings that matter, but conversions, and this one worked again, that is the real lesson.

  6. CJ #
    6

    Thanks Scott – I haven’t actually seen any proof that their idea works or even how anyone has implemented it. I’d be happy to hear about how it’s worked and what the results have been.

  7. 7

    I am not saying that their method gets results, even though it seems to be just good practices that everyone recommends in a general sense with no real road map or clear pathway to execution. I was commenting on the fact that it has worked for them to get conversions in the form of publicity and email addresses to market to.

  8. CJ #
    8

    Ah right! I was hoping for data! You are absolutely right.

  9. Ron Chmara #
    9

    I’m pretty sure that his use of phrases like “LSI” (and it’s neologism relative, “Social LSI”) are being used in the same (often inappropriate) manner as SEO folks who talk about a “site’s PageRank” or use any of the many vogue, if vague, hyphenates that involve “-juice” or “-love”.

    Rather than attempting to go into more complex details, entire concepts involving a great deal of learning and study are simply accompanied by some hand waving, a few magic words, and hopes (promises?) that this will produce a rabbit out of a hat.

    I would consider it a matter of marketing (explaining?) complex knowledge to people who don’t even understand the fundamental terms being used (and sometimes, those doing the marketing/explaining may have gaps in their knowledge as well).

    In his defense, he does build in some caveats, explaining that giving an accurate presentation would take a great deal more time and effort. That being said, some of the errors are nails-on-a-chalkboard, throw-something-through-my-monitor (to use the technical terms) bad… He doesn’t seem to know what stemming is, for one jaw dropper, and blatantly confuses stemming and lexemes (as if they were interchangeable concepts), before declaring that his lack of understanding cancels out a third, unrelated, concept.

  10. 10

    Wow! Very impressive!



Your Comment






© 2009-2010 Science for SEO All Rights Reserved -- Copyright notice by Blog Copyright

SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.