Skip to content
 

Search Technologies… Lucene, Solr and the importance of ‘Search’

Reflecting the nature of trends in web user experience, my work as a developer has led me to be quite involved in the field of ‘search’. It’s fair to say I never fully appreciated how important information retrieval / search theory would become in my career, and what ‘search’ is really all about.

A few months ago I changed jobs and began working for ProQuest. Both here and at my previous employer, Open Objects, search technologies are a key part of the underlying software infrastructure on which their web based products are built. Generally this is the case for web applications which allow users to sift through large amounts of semi-structured full-text data (documents), and retrieve the specific information they are after with ease.

I think its no trade secret that a lot of [forward thinking] organisations to whom search is important are using or moving towards the use of open source search solutions, of which there are currently two major projects: Apache Lucene and Apache Solr. The former is a Java library which provides a wide range of indexing and searching functionality, and the latter is a self contained web application providing further functionality and opening up Lucene’s features over a HTTP based interface.

Its important to make the distinction between searching and fulfilling an information need. ‘Search’ is just a mechanism through which users find the information they need, and particular technologies and features provide a variety of ways they can do this. Some people speculate that the future of information retrieval may not look like ‘search’ at all. In fact there has for some time been growing interest in algorithms that use data continually collected from a number of sources to ‘learn’ and provide the user with the information they need.

You can use Solr without much understanding of Lucene’s composition, and you can also use Lucene without much knowledge of information retrieval methods. This is testament to both projects’ APIs. However the more you find yourself wanting to do in the search space, the more useful it is to develop an understanding of the concepts underpinning these technologies.

I think that’ll suffice for an intro to search circa 2013. Hope it was useful, more to come!

Leave a Reply