I wanted to get your opinion on how the Google algorithm actually works according to Leslie Rohde from “StomperNet”. He posted 2 videos, one about LSI and one about “Referential integrity”.
“Warning – “Advanced” SEO Technique DOES NOT WORK” video:
I can see a whole lot of confusion in this. The main one being that LSI has more than moved on since the 80′s, Plsi has been favoured for some time as well as a number of variants (Latent Dirichlet Allocation for example). LSI in it’s wikipedia explained form is…well…very basic. I have always said that LSI wasn’t used in that way in the Google search engine. LSI (and its variants) is a really useful method used in many systems, not just search engines. It is also usually just one cog in the machine.
My analogy for it has been that NASA use glue in their rockets, but it’s not the same glue as you find in your child’s pencil case.
The other big problem is that the entire concept in this video is flawed because it assumes that Google is built on LSI alone. The tests done to prove that Google doesn’t use LSI are completely flawed. This not how the technique works as you all well know. It’s a cog in the machine and isolating it like this achieves nothing but sensational banter.
The statements about “years of AI” not solving the problems in word disambiguation and context extraction is unfair because for a start that’s in the domain of natural language processing, cognitive linguistics, computational linguistics and so on. It’s also extremely difficult. I’m not at all sure that Mr Rohde truly understands that.
He also says that LSI “claims” to handle synonyms and word forms with math – Doesn’t it? It’s a statistical method after all so it would hardly come as a shock.
Saying “LSI is lame” is a rather silly statement imo. There’s really nothing else to say about that.
I will leave it to you to watch the video if you like and make your own mind up about it.
“Referential Integrity Rocks!” video:
This is a database concept as many of you will know. For those who don’t, it’s a method that makes sure that relationships between tables in a database remain consistent. It makes sure that when an update is made to one entry in the database, this change is reflected in any other entries that are affected by that update. Database methods whilst useful are not the bread and butter of search engines.
This video is the follow up to the last, and it sets out to demonstrate to us that “referential integrity” is in fact the bread and butter of the Google search algorithm if you will.
In the last video I was amused and happy to pick out some bits and pieces. In this one, I am completely confused. Let me know how you get on.
I gather that apparently Google uses “social LSI” which is totally new concept for me. I couldn’t find any background on it either (not even in Google), so please volunteer some if you can.
Anchor text used to find what docs relate to each other. There is a word index and a link index. The query is matched against both tables and both indices are combined. This seems to be the main method.
Again a bold statement saying that Google succeeds where LSI has failed – one is a search engine and the other a commonly used method in information retrieval. In fact it’s not a lot of code either.
It goes on to talk about topic detection using on-page factors. Now I’ve done a bit of work in the area of topic detection and I can tell you that it is so enormously complicated and requires knowledge from several different areas of computing to even attempt. This explanation appears overly simplistic.
Reputation is assessed using link text and links, which is nothing new. Again this is pretty simplistic imo.
There’s a confirmation stage as well, which serves as a kind of check on the other 2 stages, maybe it applies constraints or something, I don’t know.
The rest of the video seems to reiterate seo methods used currently by many practitioners. I will leave it to you to check it out if you like and learn about all the detail.
I would be interested to hear about what you have to say about this information. You are all used to peer reviewing no doubt and also used to pulling apart algorithms. Feel free to share thoughts, material, reactions…
Here is a lecture by professor Andrew Ng from Stanford University on Machine Learning (he also introduces LSI):