I’m often asked by students, researchers in other areas and sometimes SEO people where they can find LSI/LSA source code/tools. My favourite beginners tutorial on LSI
is by Genevieve Gorrell
from Sheffield University. The term is LSA mostly used in computer science these days but it doesn’t matter what you call it.
There are a number of packages which will allow you to use LSA/I and also offer many other useful things regarding semantic analysis, IE and IR for example.
For coding your own, you’ll need to in short:
- Have a stopword file
- Process each file
- Compute the weights
- Normalize
- Print your data
There’s a MATLAB
(most unis will have licences allowing you to get a free copy) toolbox called TMG
which will allow for clustering, retrieval, indexing, dimensionality reduction and classification – a powerful package indeed! Also MATLAB does a whole load of things because there are plenty of extensions freely available such as the SVM Toolbox
.
(most unis will have licences allowing you to get a free copy) toolbox called TMG
which will allow for clustering, retrieval, indexing, dimensionality reduction and classification – a powerful package indeed! Also MATLAB does a whole load of things because there are plenty of extensions freely available such as the SVM Toolbox
.JLSI
is a Java implementation freely available.
is a Java implementation freely available.The semantic-engine
which also uses LSI/A in C++ (Google code).
which also uses LSI/A in C++ (Google code).The semantic vectors package
is also available in Java + Lucene.
is also available in Java + Lucene.There’s a working online tool
at Uni Colorado LSA group. It also does other types of classification.
at Uni Colorado LSA group. It also does other types of classification.There’s gCLUTO
with a nice interface for you – it gives you a graphical representation of clusters.
with a nice interface for you – it gives you a graphical representation of clusters.There’s a demo here
from Telecordia.
from Telecordia.There’s also a PLSI parser here
. If you want to try the other variant and compare.
. If you want to try the other variant and compare.I think that will do for now, I hope that you have fun with these


Please send me the LSA/LSI source code.
There is no silver spoon! It’s already linked to here.