Fulltext search your CouchDB in Ruby

Background

CouchDB is a awesome schema-free document-oriented database which now a official Apache project, Rubists quickly sit & relax with CouchDB viewing it as a different way to solve the storage problem.

Between me and my buddy Nathan, we have quickly decided to use CouchDB as the secondary tier of data storage in the application we are building. There are many CouchDB library in Ruby from CouchObject, ActiveCouch, CouchRest, RelaxDB and CouchPotato. Well, that is a lot of libraries to research and it took me quite a while to settle with CouchRest.

Things are good… Until one day, I that realize there is no obvious ways perform Fulltext index/search in Couchdb.

Pick A Search Frameworks

There are many open source and commercial fulltext search engine, Sphinx is a fast and reliable indexer that supports two popular open source database, thinking-sphinix is best Ruby plugin at the moment to enable Fulltext searching to your Rails application. However, Sphinx has a strict indexing requirement of 32bit or 64bit primary key for target document, so It doesn’t work for CouchDB.

CouchDB-lucene is the defacto fulltext search plugin for CouchDB, building CouchDB-lucene is quiet straightforward given if you know how to handle a java stack on your server then compiled, configure and restart. There are two reasons i decided to keep looking for alternative (I am not anti-java).

  • Dependent on Java stack and building couchdb-lucene itself requires a dozen dependent libraries.
  • Extending couchdb-lucene and customization needs to be done in Java.

Interaction with Ruby
Libraries like Ferret and Xapian has native ruby binding and which can be customizable in Ruby. It happens to be extremely important for my project which requires interfacing with the search index and Interact with the search toolkit in almost every aspect such as Xapian Database, Indexer and QueryParser in Ruby. This ability has significant importance to my project to fine tune and perform additional analysis.

Xapian trying to Xap-it

Xapian
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search >facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. It includes binding for many programming languages (Ruby, Python, PHP, Perl), Documentation on the Ruby binding is definitely lacking, Luckily, It provides pretty much all core C++ class and methods wrapped in Ruby.

Xapit
Xapit( pronounce as “zap it” ) is a new high level search library developed by RyanB, High level Ruby library for interacting with Xapian, a full text search engine that is in Ruby and most importantly ORM agnostic.

It includes a AbstractAdapter class ready to be extended for any other kind of ORM, Consequently, that is exactly what i did, I forked the project and 30 minutes later, It is indexing and searching my couchdb objects.

Introducing Xapit with CouchRest support

Please refer to this documentation on setting up Xapian and installation of Xapit. To enable Xapit in your model you need to include Xapit::Membership to enable Xapit search. Xapit’s indexing is processed in Ruby so it supports virtual attributes be adding a ruby methods and this is exactly how i intend to index CouchDB’s nested attributes.

item.rb class Item < CouchRest::ExtendedDocument use_database COUCHDB_SERVER

  # Enable xapit for this model
  include Xapit::Membership

  xapit do |index|
     index.text :title, :weight => 2
     index.text :description, :weight => 2

     # Index nested document property by a method
     index.text :feature_names
  end

  property :owner_id
  property :title
  property :description
  property :features

  view_by :owner_id

  def feature_names
    self.features.map{|m| m['name'] if m }.compact
  end
end

play with it in your script/console

Xapit.index_all Item.search(‘Nintendo Wii’)

Conclusion

Xapian is feature rich and it’s got a lot more to offer, Looks like I will be settling with Xapian and a Ruby oriented way of fulltext indexing for my CouchDB needs.

Limitation

  • Xapit is still under active development
  • You need to trigger Index update manually
  • It doesn’t Incremental index update at the moment