Using Solr / Lucene for full text search with MySQL DB
Posted on May 14th, 2009 by Shibu BasheerSolr is a standalone webservice application that can be installed on any servlet container like tomcat, jetty etc. It uses the popular Lucene java library to provide enterprise level search results from databases, filesystem, web services etc. Solr runs as a web service, so in effect it provides a cross platform search engine. The results can be accessed from php, java, RoR or .NET by invoking its web service. The only requirement on your server is that it should allow you to run java application or have deploy solr as a webapp in an existing servlet container such as tomcat or jetty.
This article shows how to get Solr index and provide search results for a simple MySQL table. The search results will be provided in XML, so you can get your web application to read the results, parse it and display it any form desired.
For this article, we will use the standalone solr nightly build. We use the nightly build as one of the features called deltaImportQuery is not available in the current stable 1.3.0 release. The deltaImportQuery allows you to make delta indexing of data from your DB. You can download a nightly build from http://people.apache.org/builds/lucene/solr/nightly/
You also need to download the latest MySQL JDBC driver from http://dev.mysql.com/downloads/connector/j/3.1.html
Install Solr
1. Unzip or untar the downloaded solr nightly build package. Assume the unzipped directory is solr.
2. cd into solr/example directory which has a standalone solr server running on jetty.
3. Execute the server by “java -jar startup.jar”
4. Test the server by accessing http://<servername_or_ip>:8983/solr/admin/ . If you get an admin page with a search box, then your solr is running well, and ready.
Configure MySQL Database
1. Copy the downloaded mysql jdbc driver file into solr/lib directory.
2. Create a new xml file called data-import.xml , change the obvious variables to suit your DB. In this example, I am indexing a Joomla DB table called jos_content.
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/database"
user="user"
password="password"/>
<document name="doc">
<entity name="jos_content"
query="select * from jos_content WHERE state=1"
deltaImportQuery="SELECT * FROM `jos_content` WHERE id='${dataimporter.delta.job_jobs_id}'"
deltaQuery="SELECT id FROM `jos_content` WHERE modified > '${dataimporter.last_index_time}'">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="introtext" name="introtext" />
<field column="fulltext" name="fulltext" />
</entity>
</document>
</dataConfig>
3. Edit file solrconfig.xml which is located in solr/example/solr/conf directory. Add the following requestHandler entry if not already existing.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/solr/data-config.xml</str> </lst> </requestHandler>
4. Now we will configure solr’s schema by editing schema.xml in solr/example/solr/conf directory. Add or edit the following fields as required. The xml format is self explanatory.
<fields> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="title" type="text" indexed="true" stored="true" required="true"/> <field name="introtext" type="text" indexed="true" stored="true" required="true"/> <field name="fulltext" type="text" indexed="true" stored="true" required="true"/> <dynamicField name="*" type="ignored" /> </fields> <uniqueKey>id</uniqueKey> <!-- field for the QueryParser to use when an explicit fieldname is absent --> <defaultSearchField>fulltext</defaultSearchField>
5. Stop and restart solr instance. Check if there are any jdbc errors, this could happen if the jdbc drivers are not properly installed.
Performing full or delta indexing
If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser. http://<your_solr_server>:8983/solr/dataimport?command=full-import
You can check the status of the command by accessing http://<your_solr_server>:8983/solr/dataimport
If everything works correctly, you can now search for data from http://<your_solr_server>:8983/solr/admin/ and you should now have data results in XML format.
To do an incremental or delta indexing of data since the last full or delta, increment, issue the command http://<your_solr_server>:8983/solr/dataimport?command=delta-import
You can now access these xml results from your web application. There are client api’s available for RoR, php, java etc.
References
http://www.ipros.nl/2008/12/15/using-solr-with-wordpress/
http://wiki.apache.org/solr/DataImportHandler#head-df246a3aed0bb38297f3449bc35a0bdf38a272b5

June 16th, 2009 at 10:11 am
Shibu,
I saw your article and would appreciate if you could share any code that formats the search results. I am looking to implement solr for filesystem.
Cheers,
Sai.
June 22nd, 2009 at 6:30 pm
There is an open source code available for php called solrPhpClient to use solr search with php.
First thing we do is to create an object:
$solr = new Apache_Solr_Service( ‘localhost’, ‘8983′, ‘/solr’ );
For searching we call the function:
$response = $solr->search( $searchkey, $offset, $limit , $params);
The result of this function is basically an xml doc, converted into an object array. Inorder to read each value from the array, we can use a for each array.
function test(){
if ( $response->getHttpStatus() == 200 ) {
if ( $response->response->numFound > 0 ) {
foreach ( $response->response->docs as $doc ) {
$result[] = $doc;
}
}
}
else {
echo $response->getHttpStatusMessage();
}
return $result;
}
The variable $result; is an array which contains the entire results.
June 29th, 2009 at 6:52 am
hey Mahesh Murali, I downloaded solrPHPClient and I’m trying to use the code you posted, but I keep getting the same error
Fatal error: Call to a member function getHttpStatus() on a non-object
on the line /* if ( $response->getHttpStatus() == 200 ) { */
here are the parameters I’m putting in the search function: $solr = solr->search(’title:digg’, $offset, $limit,$params );
can you help me please???
thanks Shibu Basheer for the great tutorial
June 30th, 2009 at 9:46 am
Hi Mike,
You used $solr = solr->search(’title:digg’, $offset, $limit,$params ); ?
Please use $response = $solr->search(’title:digg’, $offset, $limit,$params ); and then if ( $response->getHttpStatus() == 200 ) will not throw any exception.
Would love to hear how you get on with soir/lucene…
Good Luck.
July 8th, 2009 at 12:01 am
hey Mahesh Murali thanks a lot,
I have one problem, I hope you can help me…
I need to add more tables and do Joins, but I don’t know how to configure the data-config.xml
July 8th, 2009 at 9:58 am
Hi Mike,
I think you have to use multiple to achieve table joins.
<field column= ….
….
<field column= ….
Just try it.
July 8th, 2009 at 10:03 am
Hi Mike,
I think you have to use multiple to achieve table joins.
<field column= ….
….
<field column= ….
Just try it.
September 7th, 2009 at 5:05 pm
Hi there
i am new to this but i found ur article of great help.Although i m able to connect to SOLR and it is working successfully but on querying , SOLR doesn’t come up with any related records.On debugging i found its not able to fetch any records.
If u could tell what could be the issue , that would be of great help
thanks in advance
October 23rd, 2009 at 3:28 pm
This is good, seems like there is an error you say to save the file as data-import.xml, but then refer to it as : data-config.xml.
Also you may want to give your vm some more memory when indexing, mine barfed with out of memory error, I used -mx512m for mine and it ran after that.
November 23rd, 2009 at 12:31 am
Thanks for this write up, Shibu. It’s one of the clearest walkthroughs I’ve come across.
December 18th, 2009 at 1:16 pm
Thanks for your tutorial.
I just got a problem when trying to import, on Windows : the MySQL driver was not found, even when i put it in /lib .
In fact, it seems that the driver needs to be copied in solr/example/lib
January 23rd, 2010 at 11:23 pm
This is of great help!
But, unfortunately I keep getting this error.
“HTTP ERROR: 404
missing core name in path
RequestURI=/solr/admin/index.jsp
Powered by Jetty://”
When I run the solr first time, it runs fine, but as soon as I make the changes proposed by you, I get this error.
Please help me out. I am stuck.
February 8th, 2010 at 11:44 am
I am creating .xml files which are then posted to solr for creating documents.
The xml files have tags in which there are several tags.
Now consider an example where we data for a car. We would like to add one doc related to some features of the car like mileage, horsepower e.t.c. We will add another doc containing information about the price, service about the same car.
Now both the above docs will contain an ID representing the same car.
Now when search is performed, how do I get the result combined from both the above docs.
For instance, if I search for feature:13KM/L price:60K, I want the search results having combined score from the docs containing the same car ID.
Merging these docs is not favourable as I want to add,delete and upadate the features for a particular car.
March 19th, 2010 at 4:10 pm
Hio,
hey i did everything and even the table in the data base was processed fully with no error or nothing…
im using the solrphpclient…
What im unable to do is that if i fire a query : NO RESULT IS RETURNED!
the index is done but dont know why this is happening…
the coding is same as the example code for solrphpclient
search($query, 0, $limit);
}
catch (Exception $e)
{
// in production you’d probably log or email this error to an admin
// and then show a special message to the user but for this example
// we’re going to show the full exception
die(”SEARCH EXCEPTION{$e->__toString()}”);
}
}
?>
PHP Solr Client Example
Search:
<input id="q" name="q" type="text" value="”/>
response->numFound;
$start = min(1, $total);
$end = min($limit, $total);
?>
Results – of :
response->docs as $doc)
{
?>
$value)
{
?>
March 22nd, 2010 at 5:34 pm
works perfectly
but with a few minor changes on the localhost…
schema.xml
introtext
March 24th, 2010 at 11:26 am
hi,
great article here…i have problem indexing with a million rows of mysql data…somehow i cannot get it indexed with large data…it keeps showing “Indexing failed. Rolled back all changes”…however, when i put let’s say LIMIT 10000, it indexed perfectly…anyone has solution to index large mysql data? thanks
April 13th, 2010 at 11:17 am
Hi Shibu
My installed test fine after I start start.jar file. Afterward I have problem.
I’m confused. Can you help out. From you notes, is the file data-import.xml and data-config.xml the same. I followed the exact steps and couldn’t get to work. I do have the table jos_content on my mysql database.
I tried to put the data-import.xml and solrconfig.xml in the directory /apache-solr-1.4.0/example/solr/conf. For the schema.xml I added the lines to the original schema.xml file
fulltext
The part that I confused is the solconfig.xml. The line
/solr/data-config.xml
Can you give example of the data-config.xml and the full path I should put the file.
Thanks
April 16th, 2010 at 11:39 am
Hi azrain,
Can you please let me know the details of your sor process.
I have indexed lakhs of records without any issue.
April 23rd, 2010 at 3:58 pm
Hi Shibu
I just wanted to clarify. Under point 2 of “Configure MySQL Database” you say to create a file called data-import.xml but then under the next point:
3. Edit file solrconfig.xml you refer to that file as data-config.xml
Is that a typo or are you creating two files?
Can I also ask – does the mysql-connector-[version].jar file need to be renamed to anything else?
Cheers
Martin
April 23rd, 2010 at 6:05 pm
HI,
Hi Dave,
There is only one file called data-config.xml.(not data-import.xml).
Both solrconfig.xml and data-import.xml should be in the same folder.
The entry in the solrconfig.xml is like:
data-config.xml
Thanks
May 11th, 2010 at 2:34 am
Hi Mahesh,
Thanks for your inputs. I finally get my Solr to work. Now, my question is fq (filter query).
This is what I want.
q=book_description&fq=bookid:91
$addParameters = array(
‘fq’ => ‘dataid’,
‘dataid’ => 91
‘facet’ => ‘true’
);
my $response = $solr->search($query, $start, $rows, $addParameters);
It doesn’t work.
June 3rd, 2010 at 8:15 pm
Hi Martin,
No both data-import.xml and data-import.xml are different files for different purposes. No you do not need to rename the jar file as the server should automatically pick it up .
Thanks,
Shibu