Using Solr / Lucene for full text search with MySQL DB
Solr is a standalone webservice application that can be installed on any servlet container like tomcat, jetty etc. It uses the popular Lucene java library to provide enterprise level search results from databases, filesystem, web services etc. Solr runs as a web service, so in effect it provides a cross platform search engine. The results can be accessed from php, java, RoR or .NET by invoking its web service. The only requirement on your server is that it should allow you to run java application or have deploy solr as a webapp in an existing servlet container such as tomcat or jetty.
This article shows how to get Solr index and provide search results for a simple MySQL table. The search results will be provided in XML, so you can get your web application to read the results, parse it and display it any form desired.
For this article, we will use the standalone solr nightly build. We use the nightly build as one of the features called deltaImportQuery is not available in the current stable 1.3.0 release. The deltaImportQuery allows you to make delta indexing of data from your DB. You can download a nightly build from http://people.apache.org/builds/lucene/solr/nightly/
You also need to download the latest MySQL JDBC driver from http://dev.mysql.com/downloads/connector/j/3.1.html
Install Solr
1. Unzip or untar the downloaded solr nightly build package. Assume the unzipped directory is solr.
2. cd into solr/example directory which has a standalone solr server running on jetty.
3. Execute the server by “java -jar startup.jar”
4. Test the server by accessing http://<servername_or_ip>:8983/solr/admin/ . If you get an admin page with a search box, then your solr is running well, and ready.
Configure MySQL Database
1. Copy the downloaded mysql jdbc driver file into solr/lib directory.
2. Create a new xml file called data-import.xml , change the obvious variables to suit your DB. In this example, I am indexing a Joomla DB table called jos_content.
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/database"
user="user"
password="password"/>
<document name="doc">
<entity name="jos_content"
query="select * from jos_content WHERE state=1"
deltaImportQuery="SELECT * FROM `jos_content` WHERE id='${dataimporter.delta.job_jobs_id}'"
deltaQuery="SELECT id FROM `jos_content` WHERE modified > '${dataimporter.last_index_time}'">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="introtext" name="introtext" />
<field column="fulltext" name="fulltext" />
</entity>
</document>
</dataConfig>
3. Edit file solrconfig.xml which is located in solr/example/solr/conf directory. Add the following requestHandler entry if not already existing.
<requestHandler name="/dataimport"> <lst name="defaults"> <str name="config">/solr/data-config.xml</str> </lst> </requestHandler>
4. Now we will configure solr’s schema by editing schema.xml in solr/example/solr/conf directory. Add or edit the following fields as required. The xml format is self explanatory.
<fields> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="title" type="text" indexed="true" stored="true" required="true"/> <field name="introtext" type="text" indexed="true" stored="true" required="true"/> <field name="fulltext" type="text" indexed="true" stored="true" required="true"/> <dynamicField name="*" type="ignored" /> </fields> <uniqueKey>id</uniqueKey> <!-- field for the QueryParser to use when an explicit fieldname is absent --> <defaultSearchField>fulltext</defaultSearchField>
5. Stop and restart solr instance. Check if there are any jdbc errors, this could happen if the jdbc drivers are not properly installed.
Performing full or delta indexing
If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser. http://<your_solr_server>:8983/solr/dataimport?command=full-import
You can check the status of the command by accessing http://<your_solr_server>:8983/solr/dataimport
If everything works correctly, you can now search for data from http://<your_solr_server>:8983/solr/admin/ and you should now have data results in XML format.
To do an incremental or delta indexing of data since the last full or delta, increment, issue the command http://<your_solr_server>:8983/solr/dataimport?command=delta-import
You can now access these xml results from your web application. There are client api’s available for RoR, php, java etc.
References
http://www.ipros.nl/2008/12/15/using-solr-with-wordpress/
http://wiki.apache.org/solr/DataImportHandler#head-df246a3aed0bb38297f3449bc35a0bdf38a272b5
http://lucene.apache.org/solr/tutorial.html
Cabot Technology Solutions is an Enterprise Mobile apps development company.


Sai says:
Tuesday, June 16, 2009 at 10:11am
Shibu,
I saw your article and would appreciate if you could share any code that formats the search results. I am looking to implement solr for filesystem.
Cheers,
Sai.
Mahesh Murali says:
Monday, June 22, 2009 at 6:30pm
There is an open source code available for php called solrPhpClient to use solr search with php.
First thing we do is to create an object:
$solr = new Apache_Solr_Service( ‘localhost’, ’8983′, ‘/solr’ );
For searching we call the function:
$response = $solr->search( $searchkey, $offset, $limit , $params);
The result of this function is basically an xml doc, converted into an object array. Inorder to read each value from the array, we can use a for each array.
function test(){
if ( $response->getHttpStatus() == 200 ) {
if ( $response->response->numFound > 0 ) {
foreach ( $response->response->docs as $doc ) {
$result[] = $doc;
}
}
}
else {
echo $response->getHttpStatusMessage();
}
return $result;
}
The variable $result; is an array which contains the entire results.
Mike says:
Monday, June 29, 2009 at 6:52am
hey Mahesh Murali, I downloaded solrPHPClient and I’m trying to use the code you posted, but I keep getting the same error
Fatal error: Call to a member function getHttpStatus() on a non-object
on the line /* if ( $response->getHttpStatus() == 200 ) { */
here are the parameters I’m putting in the search function: $solr = solr->search(‘title:digg’, $offset, $limit,$params );
can you help me please???
thanks Shibu Basheer for the great tutorial
Mahesh Murali says:
Tuesday, June 30, 2009 at 9:46am
Hi Mike,
You used $solr = solr->search(’title:digg’, $offset, $limit,$params ); ?
Please use $response = $solr->search(’title:digg’, $offset, $limit,$params ); and then if ( $response->getHttpStatus() == 200 ) will not throw any exception.
Would love to hear how you get on with soir/lucene…
Good Luck.
Mike says:
Wednesday, July 8, 2009 at 12:01am
hey Mahesh Murali thanks a lot,
I have one problem, I hope you can help me…
I need to add more tables and do Joins, but I don’t know how to configure the data-config.xml
Mahesh Murali says:
Wednesday, July 8, 2009 at 9:58am
Hi Mike,
I think you have to use multiple to achieve table joins.
<field column= ….
….
<field column= ….
Just try it.
Mahesh Murali says:
Wednesday, July 8, 2009 at 10:03am
Hi Mike,
I think you have to use multiple to achieve table joins.
<field column= ….
….
<field column= ….
Just try it.
shaili says:
Monday, September 7, 2009 at 5:05pm
Hi there
i am new to this but i found ur article of great help.Although i m able to connect to SOLR and it is working successfully but on querying , SOLR doesn’t come up with any related records.On debugging i found its not able to fetch any records.
If u could tell what could be the issue , that would be of great help
thanks in advance
Joel says:
Friday, October 23, 2009 at 3:28pm
This is good, seems like there is an error you say to save the file as data-import.xml, but then refer to it as : data-config.xml.
Also you may want to give your vm some more memory when indexing, mine barfed with out of memory error, I used -mx512m for mine and it ran after that.
Alex Dunae says:
Monday, November 23, 2009 at 12:31am
Thanks for this write up, Shibu. It’s one of the clearest walkthroughs I’ve come across.
Fred says:
Friday, December 18, 2009 at 1:16pm
Thanks for your tutorial.
I just got a problem when trying to import, on Windows : the MySQL driver was not found, even when i put it in /lib .
In fact, it seems that the driver needs to be copied in solr/example/lib
Swapnil Agarwal says:
Saturday, January 23, 2010 at 11:23pm
This is of great help!
But, unfortunately I keep getting this error.
“HTTP ERROR: 404
missing core name in path
RequestURI=/solr/admin/index.jsp
Powered by Jetty://”
When I run the solr first time, it runs fine, but as soon as I make the changes proposed by you, I get this error.
Please help me out. I am stuck.
Swapnil Agarwal says:
Monday, February 8, 2010 at 11:44am
I am creating .xml files which are then posted to solr for creating documents.
The xml files have tags in which there are several tags.
Now consider an example where we data for a car. We would like to add one doc related to some features of the car like mileage, horsepower e.t.c. We will add another doc containing information about the price, service about the same car.
Now both the above docs will contain an ID representing the same car.
Now when search is performed, how do I get the result combined from both the above docs.
For instance, if I search for feature:13KM/L price:60K, I want the search results having combined score from the docs containing the same car ID.
Merging these docs is not favourable as I want to add,delete and upadate the features for a particular car.
abhax says:
Friday, March 19, 2010 at 4:10pm
Hio,
hey i did everything and even the table in the data base was processed fully with no error or nothing…
im using the solrphpclient…
What im unable to do is that if i fire a query : NO RESULT IS RETURNED!
the index is done but dont know why this is happening…
the coding is same as the example code for solrphpclient
search($query, 0, $limit);
}
catch (Exception $e)
{
// in production you’d probably log or email this error to an admin
// and then show a special message to the user but for this example
// we’re going to show the full exception
die(“SEARCH EXCEPTION{$e->__toString()}”);
}
}
?>
PHP Solr Client Example
Search:
<input id="q" name="q" type="text" value="”/>
response->numFound;
$start = min(1, $total);
$end = min($limit, $total);
?>
Results – of :
response->docs as $doc)
{
?>
$value)
{
?>
abhax says:
Monday, March 22, 2010 at 5:34pm
works perfectly
but with a few minor changes on the localhost…
schema.xml
introtext
azrain says:
Wednesday, March 24, 2010 at 11:26am
hi,
great article here…i have problem indexing with a million rows of mysql data…somehow i cannot get it indexed with large data…it keeps showing “Indexing failed. Rolled back all changes”…however, when i put let’s say LIMIT 10000, it indexed perfectly…anyone has solution to index large mysql data? thanks
dave says:
Tuesday, April 13, 2010 at 11:17am
Hi Shibu
My installed test fine after I start start.jar file. Afterward I have problem.
I’m confused. Can you help out. From you notes, is the file data-import.xml and data-config.xml the same. I followed the exact steps and couldn’t get to work. I do have the table jos_content on my mysql database.
I tried to put the data-import.xml and solrconfig.xml in the directory /apache-solr-1.4.0/example/solr/conf. For the schema.xml I added the lines to the original schema.xml file
fulltext
The part that I confused is the solconfig.xml. The line
/solr/data-config.xml
Can you give example of the data-config.xml and the full path I should put the file.
Thanks
prabu says:
Thursday, December 16, 2010 at 7:59pm
Hi, its working well on windows xp using jetty server.
Question is,
I do not know where to place mysql connector in tomcat5 using apace solr on centos5.
My solr home directory look like.
/var/solr/
bin
data
conf
solr.war
i cannot find lib directory. any suggestion?
Kristofer Pettersson says:
Wednesday, July 27, 2011 at 2:24pm
The data import handler seems like a very good piece of software. However, doing Change-Data-Capture through the client interface isn’t optimal for MySQL. A better scale-out pattern would be to hook on to the binary log. I’m trying to realize this through the binlog API in the MySQL Labs release. Is this something you have considered?
Chary says:
Tuesday, January 31, 2012 at 5:21pm
Hi,
Its a great tutorial. It works and of great use. I would like to know what changes are to be made to index the data from multiple tables.
Arpitha says:
Friday, November 16, 2012 at 11:58am
Hi,
Its very nice tutorial to understand how to use the SOLR.
I have some doubts could you please help me.
1. How we can utilize solr for search suggestion
2. How we can use SOLR db for our search application(for suggestion)
Please help,
Thanks in advance.
Arpitha