solr

Solr is a standalone webservice application that can be installed on any servlet container like tomcat, jetty etc.  It uses the popular Lucene java library to provide enterprise level search results from databases, filesystem, web services etc.    Solr runs as a web service, so in effect it provides a cross platform search engine.  The results can be accessed from php, java, RoR or .NET by invoking its web service.  The only requirement on your server is that it should allow you to run java application or have deploy solr as a webapp in an existing servlet container such as tomcat or jetty.

This article shows how to get Solr index and provide search results for a simple MySQL table.  The search results will be provided in XML, so you can get your web application to read the results, parse it and display it any form desired.

For this article, we will use the standalone solr nightly build.  We use the nightly build as one of the features called deltaImportQuery is not available in the current stable 1.3.0 release.  The deltaImportQuery allows you to make delta indexing of data from your DB.  You can download a nightly build from http://people.apache.org/builds/lucene/solr/nightly/

You also need to download the latest MySQL JDBC driver from http://dev.mysql.com/downloads/connector/j/3.1.html

Install Solr

1. Unzip or untar the downloaded solr nightly build package. Assume the unzipped directory is solr.

2. cd into solr/example directory which has a standalone solr server running on jetty.

3. Execute the server by “java -jar startup.jar”

4. Test the server by accessing http://<servername_or_ip>:8983/solr/admin/ .  If you get an admin page with a search box, then your solr is running well, and ready.

Configure MySQL Database

1. Copy the downloaded mysql jdbc driver file into solr/lib directory.

2. Create a new xml file called data-import.xml , change the obvious variables to suit your DB.  In this example, I am indexing a Joomla DB table called jos_content.

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource type="JdbcDataSource"
                  driver="com.mysql.jdbc.Driver"
                  url="jdbc:mysql://localhost/database"
                  user="user"
                  password="password"/>

    <document name="doc">
        <entity name="jos_content"
                  query="select * from jos_content WHERE state=1"
                  deltaImportQuery="SELECT * FROM `jos_content` WHERE id='${dataimporter.delta.job_jobs_id}'"
		  deltaQuery="SELECT id FROM `jos_content` WHERE modified > '${dataimporter.last_index_time}'">

            <field column="id" name="id" />
            <field column="title" name="title" />
            <field column="introtext" name="introtext" />
            <field column="fulltext" name="fulltext" />
        </entity>
    </document>
</dataConfig>

3. Edit file solrconfig.xml which is located in solr/example/solr/conf directory. Add the following requestHandler entry if not already existing.

  <requestHandler name="/dataimport">
    <lst name="defaults">
      <str name="config">/solr/data-config.xml</str>
    </lst>
  </requestHandler>

4. Now we will configure solr’s schema by editing schema.xml in solr/example/solr/conf directory. Add or edit the following fields as required. The xml format is self explanatory.

 <fields>

	<field name="id" type="string" indexed="true" stored="true" required="true"/>
	<field name="title" type="text" indexed="true" stored="true" required="true"/>
	<field name="introtext" type="text" indexed="true" stored="true" required="true"/>

	<field name="fulltext" type="text" indexed="true" stored="true" required="true"/>
	<dynamicField name="*" type="ignored" />
 </fields>

 <uniqueKey>id</uniqueKey>
 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>fulltext</defaultSearchField>

5. Stop and restart solr instance. Check if there are any jdbc errors, this could happen if the jdbc drivers are not properly installed.

Performing full or delta indexing

If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser.  http://<your_solr_server>:8983/solr/dataimport?command=full-import

You can check the status of the command by accessing http://<your_solr_server>:8983/solr/dataimport

If everything works correctly, you can now search for data from http://<your_solr_server>:8983/solr/admin/ and you should now have data results in XML format.

To do an incremental or delta indexing of data since the last full or delta, increment, issue the command http://<your_solr_server>:8983/solr/dataimport?command=delta-import

You can now access these xml results from your web application.  There are client api’s available for RoR, php, java etc.

References

http://www.ipros.nl/2008/12/15/using-solr-with-wordpress/

http://wiki.apache.org/solr/DataImportHandler#head-df246a3aed0bb38297f3449bc35a0bdf38a272b5

http://lucene.apache.org/solr/tutorial.html

Cabot Technology Solutions is an Enterprise Mobile apps development company.