Wednesday, 14 August 2013

Configuring Solr 4 Data Import Handler with JDBC


You should have set-up Solr on Tomcat along with Tika's extracting request handler as shown in the previous two guides:    


  • Firstly we need the libraries that are required to use Data Import Handler. Create a folder and name it dih (preferably in your $SOLR_HOME), and place solr-dataimporthandler-4.0.0.jar and solr-dataimporthandler-extras-4.0.0.jar from $SOLR/dist directory in the dih folder. 

    Add this to the solrconfig.xml file:
<lib dir="../../dih" regex=".*\.jar" />
  • Now we modify the solrconfig.xml file. Add the following :
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
  • Create a db-data-config.xml. This is for the Data Import Handler configuration. It should look similar to:
<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:1234/users" user="users"
password="secret" />
<entity name="user" query="SELECT id, name from users">
<field column="id" name="id" />
<field column="name" name="name" />
<entity name="birthday" query="select birthday from table where id=${}">
<field column="description" name="description" />
  • For other database engines enter the specific driver, url, user and password attributes.
  • We now need to modify the fields section of the schema.xml file to something like the following snippet:
<field name="id" type="string" indexed="true" stored="true"
<field name="name" type="text" indexed="true" stored="true" />
<field name="birthday" type="text" indexed="true" stored="true"/>
<field name="description" type="" indexed="true"

Solr may not like type="text", if so then change it to text_general

One more thing before the indexing – you should copy an appropriate JDBC driver to the $SOLR_HOME/lib directory of your Solr installation or the dih directory we created before. You can get the libraries from the databases websites
To index, run the following query: http://localhost:8080/solr/dataimport?command=full-import .The HTTP protocol is asynchronous so you won't be updated on the status of the indexing process. To check the status of the indexing process, you can run the command once again.


  1. This comment has been removed by the author.

  2. Hello Allan Macmillan,
    thanx for the nice help by the tutorial,
    I follows all the above instruction for data import mysql to solr but now i am getting two error in admin logging
    Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 and
    Exception while processing: empinfo document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 .

    these error generate during the time of import data from mysql to solr. I also debug and message show there is
    "responseHeader": {
    "status": 0,
    "QTime": 47
    "initArgs": [
    "command": "full-import",
    "mode": "debug",
    "documents": [],
    "verbose-output": [],
    "status": "idle",
    "importResponse": "",
    "statusMessages": {
    "Time Elapsed": "0:0:0.29",
    "Total Requests made to DataSource": "1",
    "Total Rows Fetched": "0",
    "Total Documents Processed": "0",
    "Total Documents Skipped": "0",
    "Full Dump Started": "2015-10-04 15:11:43",
    "Full Import failed": "2015-10-04 15:11:43"
    "WARNING": "This response format is experimental. It is likely to change in the future."

    now plz help me. thanx in advance.

  3. Java SE & Java EE article is practical oriented and real time examples. How Java EE address the enterprise development is very important. for that you need a practical orieneted Java Training Courses you need.

    Great Article

    Online Java Training
    Online Java Training
    Java Training Institutes in Chennai
    J2EE training
    Java Training in Chennai
    Java Interview Questions
    Best Recommended books for Spring framework

  4. I am able to read the multiple pdf file in Solr with apache-tika, but i need to clean the content inside the text column.

    Let's say, I have data inside text like

    text": [
    "[5/27/2016 4:02:01 PM]",
    " Writer Id: {e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a} ",
    " Writer Instance Id: {2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb} ",
    " State: [1] Stable ",
    "Provider name: '3PAR VSS Provider' "]

    I need to clean the junk, and the output should be like:

    Writer Id e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a
    Writer Instance Id: 2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb
    State: [1] Stable
    Provider name: 3PAR VSS Provider

  5. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in TECHNOLOGY , kindly contact us
    MaxMunus Offer World Class Virtual Instructor led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Saurabh Srivastava
    Skype id: saurabhmaxmunus
    Ph:+91 8553576305 / 080 - 41103383

  6. This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot. 
    Devops Training courses
    Devops Training in Bangalore
    Best Devops Training in pune
    Microsoft azure training in Bangalore