Lemur toolkit

The Lemur toolkitmini rdf Lemur toolkit is a natural language processing and information retrieval toolkit. Having a go on this is a nice way of seeing some IR technologies functioning first hand, rather than guessing on a major SE to observe the phenomenon.

It supports all major languages, performs stemming using Porter and Krovetz, indexes loads of file formats, uses part-of-speech tagging and named entity recognition, and has an API of course (C++, C# and Java).

For retrieval:

  • Supports major language modeling approaches such as Indri and KL-divergence, as well as vector space, tf.idf, Okapi and InQuery
  • Relevance- and pseudo-relevance feedback
  • Wildcard term expansion (using Indri)
  • Passage and XML element retrieval
  • Cross-lingual retrieval
  • Smoothing via Dirichlet priors and Markov chains
  • Supports arbitrary document priors (e.g., Page Rank, URL depth)

Best of all, it’s free! There’s a set of tutorials to get you started heremini rdf Lemur toolkit.

There is a new engine from the Lemur project called Indrimini rdf Lemur toolkit which uses inference networksmini rdf Lemur toolkit.

Related Posts:


Your Comment






© 2009-2013 Science for SEO All Rights Reserved -- Copyright notice by Blog Copyright

SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.8.1, a WordPress plugin for Twitter.