Using Solr / Lucene for full text search with MySQL DB

Posted on 14/05/09 by webadmin 21 Comments

Solr is a standalone webservice application that can be installed on any servlet container like tomcat, jetty etc.  It uses the popular Lucene java library to provide enterprise level search results from databases, filesystem, web services etc.    Solr runs as a web service, so in effect it provides a cross platform search engine.  The results can be accessed from php, java, RoR or .NET by invoking its web service.  The only requirement on your server is that it should allow you to run java application or have deploy solr as a webapp in an existing servlet container such as tomcat or jetty.

This article shows how to get Solr index and provide search results for a simple MySQL table.  The search results will be provided in XML, so you can get your web application to read the results, parse it and display it any form desired.

For this article, we will use the standalone solr nightly build.  We use the nightly build as one of the features called deltaImportQuery is not available in the current stable 1.3.0 release.  The deltaImportQuery allows you to make delta indexing of data from your DB.  You can download a nightly build from http://people.apache.org/builds/lucene/solr/nightly/

You also need to download the latest MySQL JDBC driver from http://dev.mysql.com/downloads/connector/j/3.1.html

Install Solr

1. Unzip or untar the downloaded solr nightly build package. Assume the unzipped directory is solr.

2. cd into solr/example directory which has a standalone solr server running on jetty.

3. Execute the server by “java -jar startup.jar”

4. Test the server by accessing http://<servername_or_ip>:8983/solr/admin/ .  If you get an admin page with a search box, then your solr is running well, and ready.

Configure MySQL Database

1. Copy the downloaded mysql jdbc driver file into solr/lib directory.

2. Create a new xml file called data-import.xml , change the obvious variables to suit your DB.  In this example, I am indexing a Joomla DB table called jos_content.

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource type="JdbcDataSource"
                  driver="com.mysql.jdbc.Driver"
                  url="jdbc:mysql://localhost/database"
                  user="user"
                  password="password"/>

    <document name="doc">
        <entity name="jos_content"
                  query="select * from jos_content WHERE state=1"
                  deltaImportQuery="SELECT * FROM `jos_content` WHERE id='${dataimporter.delta.job_jobs_id}'"
		  deltaQuery="SELECT id FROM `jos_content` WHERE modified > '${dataimporter.last_index_time}'">

            <field column="id" name="id" />
            <field column="title" name="title" />
            <field column="introtext" name="introtext" />
            <field column="fulltext" name="fulltext" />
        </entity>
    </document>
</dataConfig>

3. Edit file solrconfig.xml which is located in solr/example/solr/conf directory. Add the following requestHandler entry if not already existing.

  <requestHandler name="/dataimport">
    <lst name="defaults">
      <str name="config">/solr/data-config.xml</str>
    </lst>
  </requestHandler>

4. Now we will configure solr’s schema by editing schema.xml in solr/example/solr/conf directory. Add or edit the following fields as required. The xml format is self explanatory.

 <fields>

	<field name="id" type="string" indexed="true" stored="true" required="true"/>
	<field name="title" type="text" indexed="true" stored="true" required="true"/>
	<field name="introtext" type="text" indexed="true" stored="true" required="true"/>

	<field name="fulltext" type="text" indexed="true" stored="true" required="true"/>
	<dynamicField name="*" type="ignored" />
 </fields>

 <uniqueKey>id</uniqueKey>
 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>fulltext</defaultSearchField>

5. Stop and restart solr instance. Check if there are any jdbc errors, this could happen if the jdbc drivers are not properly installed.

Performing full or delta indexing

If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser.  http://<your_solr_server>:8983/solr/dataimport?command=full-import

You can check the status of the command by accessing http://<your_solr_server>:8983/solr/dataimport

If everything works correctly, you can now search for data from http://<your_solr_server>:8983/solr/admin/ and you should now have data results in XML format.

To do an incremental or delta indexing of data since the last full or delta, increment, issue the command http://<your_solr_server>:8983/solr/dataimport?command=delta-import

You can now access these xml results from your web application.  There are client api’s available for RoR, php, java etc.

References

http://www.ipros.nl/2008/12/15/using-solr-with-wordpress/

http://wiki.apache.org/solr/DataImportHandler#head-df246a3aed0bb38297f3449bc35a0bdf38a272b5

http://lucene.apache.org/solr/tutorial.html

Cabot Technology Solutions is an Enterprise Mobile apps development company.

21 Comments

  1. Sai says:
    Tuesday, June 16, 2009 at 10:11am

    Shibu,

    I saw your article and would appreciate if you could share any code that formats the search results. I am looking to implement solr for filesystem.

    Cheers,
    Sai.

    Reply

  2. Mahesh Murali says:
    Monday, June 22, 2009 at 6:30pm

    There is an open source code available for php called solrPhpClient to use solr search with php.
    First thing we do is to create an object:
    $solr = new Apache_Solr_Service( ‘localhost’, ’8983′, ‘/solr’ );
    For searching we call the function:
    $response = $solr->search( $searchkey, $offset, $limit , $params);

    The result of this function is basically an xml doc, converted into an object array. Inorder to read each value from the array, we can use a for each array.
    function test(){
    if ( $response->getHttpStatus() == 200 ) {
    if ( $response->response->numFound > 0 ) {

    foreach ( $response->response->docs as $doc ) {

    $result[] = $doc;

    }

    }
    }
    else {
    echo $response->getHttpStatusMessage();
    }
    return $result;

    }

    The variable $result; is an array which contains the entire results.

    Reply

  3. Mike says:
    Monday, June 29, 2009 at 6:52am

    hey Mahesh Murali, I downloaded solrPHPClient and I’m trying to use the code you posted, but I keep getting the same error

    Fatal error: Call to a member function getHttpStatus() on a non-object
    on the line /* if ( $response->getHttpStatus() == 200 ) { */

    here are the parameters I’m putting in the search function: $solr = solr->search(‘title:digg’, $offset, $limit,$params );

    can you help me please???

    thanks Shibu Basheer for the great tutorial

    Reply

  4. Mahesh Murali says:
    Tuesday, June 30, 2009 at 9:46am

    Hi Mike,
    You used $solr = solr->search(’title:digg’, $offset, $limit,$params ); ?
    Please use $response = $solr->search(’title:digg’, $offset, $limit,$params ); and then if ( $response->getHttpStatus() == 200 ) will not throw any exception.

    Would love to hear how you get on with soir/lucene…
    Good Luck.

    Reply

  5. Mike says:
    Wednesday, July 8, 2009 at 12:01am

    hey Mahesh Murali thanks a lot,

    I have one problem, I hope you can help me…

    I need to add more tables and do Joins, but I don’t know how to configure the data-config.xml

    Reply

  6. Mahesh Murali says:
    Wednesday, July 8, 2009 at 9:58am

    Hi Mike,
    I think you have to use multiple to achieve table joins.

    <field column= ….
    ….

    <field column= ….

    Just try it.

    Reply

  7. Mahesh Murali says:
    Wednesday, July 8, 2009 at 10:03am

    Hi Mike,

    I think you have to use multiple to achieve table joins.

    <field column= ….
    ….

    <field column= ….

    Just try it.

    Reply

  8. shaili says:
    Monday, September 7, 2009 at 5:05pm

    Hi there

    i am new to this but i found ur article of great help.Although i m able to connect to SOLR and it is working successfully but on querying , SOLR doesn’t come up with any related records.On debugging i found its not able to fetch any records.

    If u could tell what could be the issue , that would be of great help

    thanks in advance

    Reply

  9. Joel says:
    Friday, October 23, 2009 at 3:28pm

    This is good, seems like there is an error you say to save the file as data-import.xml, but then refer to it as : data-config.xml.

    Also you may want to give your vm some more memory when indexing, mine barfed with out of memory error, I used -mx512m for mine and it ran after that.

    Reply

  10. Alex Dunae says:
    Monday, November 23, 2009 at 12:31am

    Thanks for this write up, Shibu. It’s one of the clearest walkthroughs I’ve come across.

    Reply

  11. Fred says:
    Friday, December 18, 2009 at 1:16pm

    Thanks for your tutorial.

    I just got a problem when trying to import, on Windows : the MySQL driver was not found, even when i put it in /lib .
    In fact, it seems that the driver needs to be copied in solr/example/lib

    Reply

  12. Swapnil Agarwal says:
    Saturday, January 23, 2010 at 11:23pm

    This is of great help!
    But, unfortunately I keep getting this error.

    “HTTP ERROR: 404
    missing core name in path
    RequestURI=/solr/admin/index.jsp
    Powered by Jetty://”

    When I run the solr first time, it runs fine, but as soon as I make the changes proposed by you, I get this error.
    Please help me out. I am stuck.

    Reply

  13. Swapnil Agarwal says:
    Monday, February 8, 2010 at 11:44am

    I am creating .xml files which are then posted to solr for creating documents.
    The xml files have tags in which there are several tags.
    Now consider an example where we data for a car. We would like to add one doc related to some features of the car like mileage, horsepower e.t.c. We will add another doc containing information about the price, service about the same car.
    Now both the above docs will contain an ID representing the same car.
    Now when search is performed, how do I get the result combined from both the above docs.
    For instance, if I search for feature:13KM/L price:60K, I want the search results having combined score from the docs containing the same car ID.

    Merging these docs is not favourable as I want to add,delete and upadate the features for a particular car.

    Reply

  14. abhax says:
    Friday, March 19, 2010 at 4:10pm

    Hio,

    hey i did everything and even the table in the data base was processed fully with no error or nothing…
    im using the solrphpclient…

    What im unable to do is that if i fire a query : NO RESULT IS RETURNED!
    the index is done but dont know why this is happening…

    the coding is same as the example code for solrphpclient

    search($query, 0, $limit);
    }
    catch (Exception $e)
    {
    // in production you’d probably log or email this error to an admin
    // and then show a special message to the user but for this example
    // we’re going to show the full exception
    die(“SEARCH EXCEPTION{$e->__toString()}”);
    }
    }

    ?>

    PHP Solr Client Example

    Search:
    <input id="q" name="q" type="text" value="”/>

    response->numFound;
    $start = min(1, $total);
    $end = min($limit, $total);
    ?>
    Results – of :

    response->docs as $doc)
    {
    ?>

    $value)
    {
    ?>

    Reply

  15. abhax says:
    Monday, March 22, 2010 at 5:34pm

    works perfectly

    but with a few minor changes on the localhost…

    schema.xml

    introtext

    Reply

  16. azrain says:
    Wednesday, March 24, 2010 at 11:26am

    hi,

    great article here…i have problem indexing with a million rows of mysql data…somehow i cannot get it indexed with large data…it keeps showing “Indexing failed. Rolled back all changes”…however, when i put let’s say LIMIT 10000, it indexed perfectly…anyone has solution to index large mysql data? thanks

    Reply

  17. dave says:
    Tuesday, April 13, 2010 at 11:17am

    Hi Shibu

    My installed test fine after I start start.jar file. Afterward I have problem.

    I’m confused. Can you help out. From you notes, is the file data-import.xml and data-config.xml the same. I followed the exact steps and couldn’t get to work. I do have the table jos_content on my mysql database.

    I tried to put the data-import.xml and solrconfig.xml in the directory /apache-solr-1.4.0/example/solr/conf. For the schema.xml I added the lines to the original schema.xml file

    fulltext

    The part that I confused is the solconfig.xml. The line
    /solr/data-config.xml

    Can you give example of the data-config.xml and the full path I should put the file.

    Thanks

    Reply

  18. prabu says:
    Thursday, December 16, 2010 at 7:59pm

    Hi, its working well on windows xp using jetty server.
    Question is,
    I do not know where to place mysql connector in tomcat5 using apace solr on centos5.
    My solr home directory look like.
    /var/solr/
    bin
    data
    conf
    solr.war
    i cannot find lib directory. any suggestion?

    Reply

  19. Kristofer Pettersson says:
    Wednesday, July 27, 2011 at 2:24pm

    The data import handler seems like a very good piece of software. However, doing Change-Data-Capture through the client interface isn’t optimal for MySQL. A better scale-out pattern would be to hook on to the binary log. I’m trying to realize this through the binlog API in the MySQL Labs release. Is this something you have considered?

    Reply

  20. Chary says:
    Tuesday, January 31, 2012 at 5:21pm

    Hi,
    Its a great tutorial. It works and of great use. I would like to know what changes are to be made to index the data from multiple tables.

    Reply

  21. Arpitha says:
    Friday, November 16, 2012 at 11:58am

    Hi,

    Its very nice tutorial to understand how to use the SOLR.

    I have some doubts could you please help me.

    1. How we can utilize solr for search suggestion
    2. How we can use SOLR db for our search application(for suggestion)

    Please help,

    Thanks in advance.

    Arpitha

    Reply

Post a Comment

Your email is never published or shared. Required fields are marked *