LSA/LSI source code & tools
I’m often asked by students, researchers in other areas and sometimes SEO people where they can find LSI/LSA source code/tools. My favourite beginners tutorial on LSI is by Genevieve Gorrell from Sheffield University. The term is LSA mostly used in computer science these days but it doesn’t matter what you call it.
There are a number of packages which will allow you to use LSA/I and also offer many other useful things regarding semantic analysis, IE and IR for example.
For coding your own, you’ll need to in short:
- Have a stopword file
- Process each file
- Compute the weights
- Print your data
There’s a MATLAB
(most unis will have licences allowing you to get a free copy) toolbox called TMG
which will allow for clustering, retrieval, indexing, dimensionality reduction and classification – a powerful package indeed! Also MATLAB does a whole load of things because there are plenty of extensions freely available such as the SVM Toolbox
is a Java implementation freely available.
There’s a working online tool
at Uni Colorado LSA group. It also does other types of classification.
with a nice interface for you – it gives you a graphical representation of clusters.
There’s a demo here
There’s also a PLSI parser here
. If you want to try the other variant and compare.
I think that will do for now, I hope that you have fun with these