Java and/or Ruby

Currently I am working on a small prototypical application involving lots of webscraping, a database and some sophisticated term indexing and search query expansion. Since some parts of the app are quite independent from the others, sharing data only via the database, I decided to code as much as I could in Ruby. There is nothing better for getting into a new language than working on a real project.

For web scraping (which I always did with Python, my absolutely favourite scripting language to date, and the fabulous BeautifulSoup library) I decided to switch to Hpricot (silly name), which does the job of scraping untidy HTML pages good enough for me.

So, what do I use as a cost-effective, well-maintained search and indexing framework? Of course Lucene, because I have grown up with Java and the bold faith that there is a library for everything under the sun - written in Java. But lately I found that there is in fact a Lucene-lookalike implementation for Ruby named Ferret. And it is even faster than Lucene. So off to new pastures.

All that is left in Java will be the RDF handling, because there is nothing as sophisticated for RDF/OWL ontology handling as the Jena library in Ruby.

Tune in next time to hear me talk about the pitfalls of weakly-typed scripting languages …

Links:

No comments yet. Be the first.

Leave a reply

Warning: comments containing links will be eaten by our spam protection.