Search features
In Breakout, users can search for spaces and any content attached to spaces. This is a unique search system because each user can only see the results that he/she is permissioned to see.
The search query includes a text string and some parameters: the object types to search (space names, wiki pages, flow articles, etc.), and the scope of the search, which can be all spaces, the user's related spaces, or the last space (usually, the space the user was looking at when he submitted the search).
Users can make themselves searchable by creating personal spaces.
Search plugins and the searchable object interface
We made the text indexing and search a pluggable feature because different installations of Breakout have very different needs. For instance, a developer who wants to run breakout to work on the UI just wants the simplest possible search with no extra indexing processes. The Assembla Breakout application service will require a sophisticated high-volume search engine. Furthermore, search needs change over time. We currently use MySQL search, which does not efficiently support search on arbitrary fields. We will want to plug in something like Lucene or Ferret when we do fielded search on directories.
A plug-in indexer and searcher is a subclass of implements three methods:
- search_unp - takes a search query and returns an unpermissioned list of results
- index_item - takes the hash returned from item.to_indexable, and adds it to the index
- index_clear - clears the index to prepare for a rebuild
Configuration
The search is configured by setting the global variable $SEARCHCLASSNAME to the class of searcher we want to use
$SEARCHCLASSNAME = 'MysqlSearcher'
The default Searcher class is set in environment.rb ($SEARCHERCLASSNAME='Searcher'). This provides a simple database search for space name and has no index. This ignores all parameters, but is very convenient for development.
In production, we are currently using MySQL text indexes, which are indexed and searched with the MysqlSearcher? subclass.
Indexing
To search text, you first have to index it. There is an indexing process callable as Searcher.index_all. This finds all of the objects in the database that have changed since it last ran, sucks the text out of them, and sends the text to the indexer, where it gets placed in some sort of index.
To continously index changes, you call Searcher.index_continously and leave it running. Use
runner Searcher.index_continously
To rebuild the index, pass the rebuild parameter to index_all or index_continously
runner Searcher.index_all(true)
Making an object searchable
A searchable object needs to implement these methods so that the indexer can index it, and the searcher can display it.
- to_indexable - returns a hash that contains all of the information needed to index and retrieve the object:
- :id => object ID
- :model => model object name (for example, "Space" or "Flow"). With the ID and the model we can retrieve the object
- :space_id => related space_id, used for rapid permissioning. For model Space, id = space_id
- :name => the name of the object. This is put into the index with a high relevance score
- :content => additional text to be indexed
- [optional :more => more fields to be indexed, for directories (not implemented)]
- get_permission - used to permission the result list
- name - display the name in a result list
- link_to_args - allows the search interface to construct a link to a found object with "link_to object.name, object.link_to_args"
MySQL Search and Indexing
Our production system uses MySQL text indexing for searching. The indexing and search is implemented in the MysqlSearcher? model object.
MySQL will index one or more columns in a table, and then text search in that index. You create a MySQL text index with the FULLTEXT keyword and find it with the MATCH keyword. There is documentation here: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
When we index with the MysqlSearcher?, we put all of the data that we want to search into the searchables table. This table combines data from spaces, flows, and wiki pages. It contains a "model" field that tells the system the type of object. The name and content fields are indexed and searchable.
To Do
We want to index and search users and user skills. Users are not related to a space, so we will need to extend the to_indexable model and the searchables table, or we will need to make a new data structure for users with a fulltext index and search object.