Working with the new tech
I’ve been working with Elasticsearch for a few different projects lately, and I thought I’d summarise some of my thoughts and experiences now I’ve come to know it reasonably well. I say reasonably because due to the potential complexity of the queries and infrastructural architecture you can use with Elasticsearch, calling yourself an expert is never going to end well!
So what were my first impressions?
- Elasticsearch is based on Lucene and designed to provide a thin JSON based RESTful web service wrapper
- It doesn’t provide as many custom extensions as Solr and therefore feels a lot closer to using Lucene more directly (IMO – a good thing!)
- It can be blisteringly fast, but many of the clients suffer from bloat and are hard to configure
- It has a great foundation and approach designed for scalability and powerful query building
- There are some really nice ways of working with nested / relational data
And my experiences:
- Building up the queries in object form is not straightforward – the ‘killer’ app wrt. Elasticsearch would be a good query builder with a well designed API
- A common pattern I’d recommend is indexing most data as ‘multi’ fields with an ‘untouched’ field which is not analyzed (and can be accessed via field_name.untouched)
- None of the Java clients are hugely impressive
- Working with JSON queries takes time to get used to – it doesn’t seem as intuitive as modifying Solr query params to me. I would recommend getting a good RESTful service GUI to help debug your queries
A note on Elasticity
One of the most notable things about Elasticsearch is the intelligent support for clustered instances and mitigating node failure. While I haven’t really worked with large enough datasets to need clustering, nor do I think that huge numbers of organisations really do, it is a big leap forward. The ease of setting up clusters of nodes and communicating effectively with them is impressive.
The only thing is, the cleverness of the distributed aspects of Elasticsearch is bordering on the realms of magic. Controlling and debugging what these seemingly autonomous nodes are doing could be fraught with difficulty. This notion of magic seems to have found its way into much Elasticsearch codebase e.g.:
- There is no way to programatically control the number of threads a Java client spawns when connecting to Elasticsearch instances.
- Multicasting is enabled by default – this means nodes broadcast signals on your network and automatically replicate data(!)
It’s hard not to be impressed by the cleverness involved in Elasticsearch, even it is also quite scary. Regardless of anyone’s opinion of this still maturing search platform: it seems the landscape of search is certainly changing and offering increasing possibilities.