Google research beyond LSI
Google picked up Amrit Gruber who is doing an internship with them. He’s pretty valuable because of his PhD research in statistical text analysis (which is what LSI is). His method is uses Hidden Topic Markov Models (HTMM) and a working version was released in 2007.
In this post Google mention PLSI (Probabilistic latent semantic indexing) and also Latent Dirichlet Allocation as examples of varients to LSI.
It’s different because instead of treating the document as a bag of words, it uses a Temporal Markov Structure.
Read the Google post here
, and OpenHTTM is available here
. Good old Google, thanks for sharing.
This supports my post
about how LSI in its very basic form as summarized in various places as well as the excellent Wikipedia is not the variety used in Google, whatever Matt Cutts says. Yes it is used, but he doesn’t give away the important information, what he presents is a very very basic version. It’s like saying “Yes, we use glue in our computer chips” or “Yes, here at NASA we use Glue as an adhesive for our rockets”. It’s unlikely to be the glue your child uses at playschool