Using Solr / Lucene for full text search with MySQL DB

Posted on May 14th, 2009 by Shibu Basheer

Solr is a standalone webservice application that can be installed on any servlet container like tomcat, jetty etc.  It uses the popular Lucene java library to provide enterprise level search results from databases, filesystem, web services etc.    Solr runs as a web service, so in effect it provides a cross platform search engine.  The results can be accessed from php, java, RoR or .NET by invoking its web service.  The only requirement on your server is that it should allow you to run java application or have deploy solr as a webapp in an existing servlet container such as tomcat or jetty.

This article shows how to get Solr index and provide search results for a simple MySQL table.  The search results will be provided in XML, so you can get your web application to read the results, parse it and display it any form desired.

For this article, we will use the standalone solr nightly build.  We use the nightly build as one of the features called deltaImportQuery is not available in the current stable 1.3.0 release.  The deltaImportQuery allows you to make delta indexing of data from your DB.  You can download a nightly build from http://people.apache.org/builds/lucene/solr/nightly/

You also need to download the latest MySQL JDBC driver from http://dev.mysql.com/downloads/connector/j/3.1.html

Install Solr

1. Unzip or untar the downloaded solr nightly build package. Assume the unzipped directory is solr.

2. cd into solr/example directory which has a standalone solr server running on jetty.

3. Execute the server by “java -jar startup.jar”

4. Test the server by accessing http://<servername_or_ip>:8983/solr/admin/ .  If you get an admin page with a search box, then your solr is running well, and ready.

Configure MySQL Database

1. Copy the downloaded mysql jdbc driver file into solr/lib directory.

2. Create a new xml file called data-import.xml , change the obvious variables to suit your DB.  In this example, I am indexing a Joomla DB table called jos_content.

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource type="JdbcDataSource"
                  driver="com.mysql.jdbc.Driver"
                  url="jdbc:mysql://localhost/database"
                  user="user"
                  password="password"/>

    <document name="doc">
        <entity name="jos_content"
                  query="select * from jos_content WHERE state=1"
                  deltaImportQuery="SELECT * FROM `jos_content` WHERE id='${dataimporter.delta.job_jobs_id}'"
		  deltaQuery="SELECT id FROM `jos_content` WHERE modified > '${dataimporter.last_index_time}'">

            <field column="id" name="id" />
            <field column="title" name="title" />
            <field column="introtext" name="introtext" />
            <field column="fulltext" name="fulltext" />
        </entity>
    </document>
</dataConfig>

3. Edit file solrconfig.xml which is located in solr/example/solr/conf directory. Add the following requestHandler entry if not already existing.

  <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">/solr/data-config.xml</str>
    </lst>
  </requestHandler>

4. Now we will configure solr’s schema by editing schema.xml in solr/example/solr/conf directory. Add or edit the following fields as required. The xml format is self explanatory.

 <fields>

	<field name="id" type="string" indexed="true" stored="true" required="true"/>
	<field name="title" type="text" indexed="true" stored="true" required="true"/>
	<field name="introtext" type="text" indexed="true" stored="true" required="true"/>

	<field name="fulltext" type="text" indexed="true" stored="true" required="true"/>
	<dynamicField name="*" type="ignored" />
 </fields>

 <uniqueKey>id</uniqueKey>
 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>fulltext</defaultSearchField>

5. Stop and restart solr instance. Check if there are any jdbc errors, this could happen if the jdbc drivers are not properly installed.

Performing full or delta indexing

If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser.  http://<your_solr_server>:8983/solr/dataimport?command=full-import

You can check the status of the command by accessing http://<your_solr_server>:8983/solr/dataimport

If everything works correctly, you can now search for data from http://<your_solr_server>:8983/solr/admin/ and you should now have data results in XML format.

To do an incremental or delta indexing of data since the last full or delta, increment, issue the command http://<your_solr_server>:8983/solr/dataimport?command=delta-import

You can now access these xml results from your web application.  There are client api’s available for RoR, php, java etc.

References

http://www.ipros.nl/2008/12/15/using-solr-with-wordpress/

http://wiki.apache.org/solr/DataImportHandler#head-df246a3aed0bb38297f3449bc35a0bdf38a272b5

http://lucene.apache.org/solr/tutorial.html

  • Share/Save/Bookmark

Tags: , ,

22 Responses to “Using Solr / Lucene for full text search with MySQL DB”

  1. Sai Says:

    Shibu,

    I saw your article and would appreciate if you could share any code that formats the search results. I am looking to implement solr for filesystem.

    Cheers,
    Sai.

  2. Mahesh Murali Says:

    There is an open source code available for php called solrPhpClient to use solr search with php.
    First thing we do is to create an object:
    $solr = new Apache_Solr_Service( ‘localhost’, ‘8983′, ‘/solr’ );
    For searching we call the function:
    $response = $solr->search( $searchkey, $offset, $limit , $params);

    The result of this function is basically an xml doc, converted into an object array. Inorder to read each value from the array, we can use a for each array.
    function test(){
    if ( $response->getHttpStatus() == 200 ) {
    if ( $response->response->numFound > 0 ) {

    foreach ( $response->response->docs as $doc ) {

    $result[] = $doc;

    }

    }
    }
    else {
    echo $response->getHttpStatusMessage();
    }
    return $result;

    }

    The variable $result; is an array which contains the entire results.

  3. Mike Says:

    hey Mahesh Murali, I downloaded solrPHPClient and I’m trying to use the code you posted, but I keep getting the same error

    Fatal error: Call to a member function getHttpStatus() on a non-object
    on the line /* if ( $response->getHttpStatus() == 200 ) { */

    here are the parameters I’m putting in the search function: $solr = solr->search(’title:digg’, $offset, $limit,$params );

    can you help me please???

    thanks Shibu Basheer for the great tutorial

  4. Mahesh Murali Says:

    Hi Mike,
    You used $solr = solr->search(’title:digg’, $offset, $limit,$params ); ?
    Please use $response = $solr->search(’title:digg’, $offset, $limit,$params ); and then if ( $response->getHttpStatus() == 200 ) will not throw any exception.

    Would love to hear how you get on with soir/lucene…
    Good Luck.

  5. Mike Says:

    hey Mahesh Murali thanks a lot,

    I have one problem, I hope you can help me…

    I need to add more tables and do Joins, but I don’t know how to configure the data-config.xml

  6. Mahesh Murali Says:

    Hi Mike,
    I think you have to use multiple to achieve table joins.

    <field column= ….
    ….

    <field column= ….

    Just try it.

  7. Mahesh Murali Says:

    Hi Mike,

    I think you have to use multiple to achieve table joins.

    <field column= ….
    ….

    <field column= ….

    Just try it.

  8. shaili Says:

    Hi there

    i am new to this but i found ur article of great help.Although i m able to connect to SOLR and it is working successfully but on querying , SOLR doesn’t come up with any related records.On debugging i found its not able to fetch any records.

    If u could tell what could be the issue , that would be of great help

    thanks in advance

  9. Joel Says:

    This is good, seems like there is an error you say to save the file as data-import.xml, but then refer to it as : data-config.xml.

    Also you may want to give your vm some more memory when indexing, mine barfed with out of memory error, I used -mx512m for mine and it ran after that.

  10. Alex Dunae Says:

    Thanks for this write up, Shibu. It’s one of the clearest walkthroughs I’ve come across.

  11. Fred Says:

    Thanks for your tutorial.

    I just got a problem when trying to import, on Windows : the MySQL driver was not found, even when i put it in /lib .
    In fact, it seems that the driver needs to be copied in solr/example/lib

  12. Swapnil Agarwal Says:

    This is of great help!
    But, unfortunately I keep getting this error.

    “HTTP ERROR: 404
    missing core name in path
    RequestURI=/solr/admin/index.jsp
    Powered by Jetty://”

    When I run the solr first time, it runs fine, but as soon as I make the changes proposed by you, I get this error.
    Please help me out. I am stuck.

  13. Swapnil Agarwal Says:

    I am creating .xml files which are then posted to solr for creating documents.
    The xml files have tags in which there are several tags.
    Now consider an example where we data for a car. We would like to add one doc related to some features of the car like mileage, horsepower e.t.c. We will add another doc containing information about the price, service about the same car.
    Now both the above docs will contain an ID representing the same car.
    Now when search is performed, how do I get the result combined from both the above docs.
    For instance, if I search for feature:13KM/L price:60K, I want the search results having combined score from the docs containing the same car ID.

    Merging these docs is not favourable as I want to add,delete and upadate the features for a particular car.

  14. abhax Says:

    Hio,

    hey i did everything and even the table in the data base was processed fully with no error or nothing…
    im using the solrphpclient…

    What im unable to do is that if i fire a query : NO RESULT IS RETURNED!
    the index is done but dont know why this is happening…

    the coding is same as the example code for solrphpclient

    search($query, 0, $limit);
    }
    catch (Exception $e)
    {
    // in production you’d probably log or email this error to an admin
    // and then show a special message to the user but for this example
    // we’re going to show the full exception
    die(”SEARCH EXCEPTION{$e->__toString()}”);
    }
    }

    ?>

    PHP Solr Client Example

    Search:
    <input id="q" name="q" type="text" value="”/>

    response->numFound;
    $start = min(1, $total);
    $end = min($limit, $total);
    ?>
    Results – of :

    response->docs as $doc)
    {
    ?>

    $value)
    {
    ?>

  15. abhax Says:

    works perfectly

    but with a few minor changes on the localhost…

    schema.xml

    introtext

  16. azrain Says:

    hi,

    great article here…i have problem indexing with a million rows of mysql data…somehow i cannot get it indexed with large data…it keeps showing “Indexing failed. Rolled back all changes”…however, when i put let’s say LIMIT 10000, it indexed perfectly…anyone has solution to index large mysql data? thanks

  17. dave Says:

    Hi Shibu

    My installed test fine after I start start.jar file. Afterward I have problem.

    I’m confused. Can you help out. From you notes, is the file data-import.xml and data-config.xml the same. I followed the exact steps and couldn’t get to work. I do have the table jos_content on my mysql database.

    I tried to put the data-import.xml and solrconfig.xml in the directory /apache-solr-1.4.0/example/solr/conf. For the schema.xml I added the lines to the original schema.xml file

    fulltext

    The part that I confused is the solconfig.xml. The line
    /solr/data-config.xml

    Can you give example of the data-config.xml and the full path I should put the file.

    Thanks

  18. Mahesh Murali Says:

    Hi azrain,

    Can you please let me know the details of your sor process.
    I have indexed lakhs of records without any issue.

  19. Martin Says:

    Hi Shibu

    I just wanted to clarify. Under point 2 of “Configure MySQL Database” you say to create a file called data-import.xml but then under the next point:
    3. Edit file solrconfig.xml you refer to that file as data-config.xml

    Is that a typo or are you creating two files?

    Can I also ask – does the mysql-connector-[version].jar file need to be renamed to anything else?

    Cheers
    Martin

  20. Mahesh Murali Says:

    HI,
    Hi Dave,

    There is only one file called data-config.xml.(not data-import.xml).
    Both solrconfig.xml and data-import.xml should be in the same folder.
    The entry in the solrconfig.xml is like:

    data-config.xml

    Thanks

  21. Dave Says:

    Hi Mahesh,

    Thanks for your inputs. I finally get my Solr to work. Now, my question is fq (filter query).
    This is what I want.

    q=book_description&fq=bookid:91

    $addParameters = array(
    ‘fq’ => ‘dataid’,
    ‘dataid’ => 91
    ‘facet’ => ‘true’
    );

    my $response = $solr->search($query, $start, $rows, $addParameters);

    It doesn’t work.

  22. Shibu Basheer Says:

    Hi Martin,

    No both data-import.xml and data-import.xml are different files for different purposes. No you do not need to rename the jar file as the server should automatically pick it up .

    Thanks,
    Shibu

Leave a Reply