Wednesday, 14 August 2013

Configuring Solr 4 Data Import Handler with JDBC

Preparation

You should have set-up Solr on Tomcat along with Tika's extracting request handler as shown in the previous two guides:
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up-tika-extracting-request.html

Set-Up

Firstly we need the libraries that are required to use Data Import Handler. Create a folder and name it dih (preferably in your $SOLR_HOME), and place solr-dataimporthandler-4.0.0.jar and solr-dataimporthandler-extras-4.0.0.jar from $SOLR/dist directory in the dih folder.

Add this to the solrconfig.xml file:

<lib dir="../../dih" regex=".*\.jar" />

Now we modify the solrconfig.xml file. Add the following :

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>

Create a db-data-config.xml. This is for the Data Import Handler configuration. It should look similar to:

<dataConfig>
<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:1234/users" user="users"
password="secret" />
<document>
<entity name="user" query="SELECT id, name from users">
<field column="id" name="id" />
<field column="name" name="name" />
<entity name="birthday" query="select birthday from table where id=${user.id}">
<field column="description" name="description" />
</entity>
</entity>
</document>
</dataConfig></dataConfig>

For other database engines enter the specific driver, url, user and password attributes.

We now need to modify the fields section of the schema.xml file to something like the following snippet:

<field name="id" type="string" indexed="true" stored="true"
required="true"/>
<field name="name" type="text" indexed="true" stored="true" />
<field name="birthday" type="text" indexed="true" stored="true"/>
<field name="description" type="" indexed="true"
stored="true"/>

Solr may not like type="text", if so then change it to text_general

One more thing before the indexing – you should copy an appropriate JDBC driver to the $SOLR_HOME/lib directory of your Solr installation or the dih directory we created before. You can get the libraries from the databases websites

To index, run the following query: http://localhost:8080/solr/dataimport?command=full-import .The HTTP protocol is asynchronous so you won't be updated on the status of the indexing process. To check the status of the indexing process, you can run the command once again.

21 comments:

Sekhar C26 February 2015 at 10:28
This comment has been removed by the author.
ReplyDelete
Replies
Sekhar C26 February 2015 at 10:50
Thanks...
ReplyDelete
Replies
Unknown4 October 2015 at 10:42
Hello Allan Macmillan,
thanx for the nice help by the tutorial,
I follows all the above instruction for data import mysql to solr but now i am getting two error in admin logging
1)
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 and
2)
Exception while processing: empinfo document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 .

these error generate during the time of import data from mysql to solr. I also debug and message show there is
{
"responseHeader": {
"status": 0,
"QTime": 47
},
"initArgs": [
"defaults",
[
"config",
"data-config.xml"
]
],
"command": "full-import",
"mode": "debug",
"documents": [],
"verbose-output": [],
"status": "idle",
"importResponse": "",
"statusMessages": {
"Time Elapsed": "0:0:0.29",
"Total Requests made to DataSource": "1",
"Total Rows Fetched": "0",
"Total Documents Processed": "0",
"Total Documents Skipped": "0",
"Full Dump Started": "2015-10-04 15:11:43",
"Full Import failed": "2015-10-04 15:11:43"
},
"WARNING": "This response format is experimental. It is likely to change in the future."
}.

now plz help me. thanx in advance.
ReplyDelete
Replies
Unknown24 December 2015 at 09:06
Java SE & Java EE article is practical oriented and real time examples. How Java EE address the enterprise development is very important. for that you need a practical orieneted Java Training Courses you need.

Great Article

Online Java Training
Online Java Training
Java Training Institutes in Chennai
J2EE training
Java Training in Chennai
Java Interview Questions
Best Recommended books for Spring framework
ReplyDelete
Replies
Unknown14 June 2016 at 16:49
I am able to read the multiple pdf file in Solr with apache-tika, but i need to clean the content inside the text column.

Let's say, I have data inside text like

text": [
"SAW",
"http://www.google.com/km/saw/print.do?docId=emr_na1[5/27/2016 4:02:01 PM]",
" Writer Id: {e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a} ",
" Writer Instance Id: {2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb} ",
" State: [1] Stable ",
"Provider name: '3PAR VSS Provider' "]

I need to clean the junk, and the output should be like:

SAW
http://www.google.com/km/saw/print.do?docId=emr_na1
Writer Id e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a
Writer Instance Id: 2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb
State: [1] Stable
Provider name: 3PAR VSS Provider
ReplyDelete
Replies
priya5 January 2019 at 10:17
This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot.
Devops Training courses
Devops Training in Bangalore
Best Devops Training in pune
Microsoft azure training in Bangalore
ReplyDelete
Replies
sasitamil2 February 2019 at 08:08
I appreciate that you produced this wonderful article to help us get more knowledge about this topic.
I know, it is not an easy task to write such a big article in one day, I've tried that and I've failed. But, here you are, trying the big task and finishing it off and getting good comments and ratings. That is one hell of a job done!

Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Selenium interview questions and answers

ReplyDelete
Replies
Mounika4 February 2019 at 10:54
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
python training in chennai
Python Online training in usa
python course institute in chennai
ReplyDelete
Replies
Raji6 March 2019 at 12:09
All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
Devops Training in Chennai | Devops Training Institute in Chennai
ReplyDelete
Replies
Softgen Infotech8 December 2019 at 12:52
Really i appreciate the effort you made to share the knowledge. The topic here i found was really effective...

Get Web Methods Training in Bangalore from Real Time Industry Experts with 100% Placement Assistance in MNC Companies. Book your Free Demo with Softgen Infotech.
ReplyDelete
Replies
Anu31 May 2020 at 17:07

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it. DevOps Training | Certification in Chennai | DevOps Training | Certification in anna nagar | DevOps Training | Certification in omr | DevOps Training | Certification in porur | DevOps Training | Certification in tambaram | DevOps Training | Certification in velachery

ReplyDelete
Replies
Keerthi26 June 2020 at 04:29
Very thank you for sharing the code.

artificial intelligence course in indore
ReplyDelete
Replies
radhika31 July 2020 at 07:13
I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

AWS training in Chennai

AWS Online Training in Chennai

AWS training in Bangalore

AWS training in Hyderabad

AWS training in Coimbatore

AWS training

ReplyDelete
Replies
deiva2 August 2020 at 08:49
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
java training in chennai

java training in omr

aws training in chennai

aws training in omr

python training in chennai

python training in omr

selenium training in chennai

selenium training in omr
ReplyDelete
Replies
Jayalakshmi2 August 2020 at 14:29
Thanks a lot for sharing such a good source with all, i appreciate your efforts taken for the same. I found this worth sharing and must share this with all.
java training in chennai

java training in tambaram

aws training in chennai

aws training in tambaram

python training in chennai

python training in tambaram

selenium training in chennai

selenium training in tambaram
ReplyDelete
Replies
jeni11 August 2020 at 07:51
I am glad it is helpful. Hopefully I will be able to release a newer version this month with some updates and improvements.

oracle training in chennai

oracle training in velachery

oracle dba training in chennai

oracle dba training in velachery

ccna training in chennai

ccna training in velachery

seo training in chennai

seo training in velachery
ReplyDelete
Replies
shiny12 August 2020 at 11:26
excellent information. It is amazing and wonderful to visit your site.

angular js training in chennai

angular js training in annanagar

full stack training in chennai

full stack training in annanagar

php training in chennai

php training in annanagar

photoshop training in chennai

photoshop training in annanagar

ReplyDelete
Replies
vivekvedha8 September 2020 at 09:17
Thanks for your informative article,Your post helped me to understand the future and career prospects &
Keep on updating your blog .
acte chennai

acte complaints

acte reviews

acte trainer complaints

acte trainer reviews

acte velachery reviews complaints

acte tambaram reviews complaints

acte anna nagar reviews complaints

acte porur reviews complaints

acte omr reviews complaints

ReplyDelete
Replies
Mohd Sharique21 October 2020 at 20:39
This was nice and amazing and the given contents were very useful and the precision has given here is good.

Apache Spark Training in Pune
Python Classes in Pune
ReplyDelete
Replies
Selenium Training in Pune20 January 2021 at 08:17
That's really impressive and helpful information you have given, very valuable content.
We are also into education and you also can take advantage really awesome job oriented courses

ReplyDelete
Replies
Best Training Institute3 February 2021 at 11:41
I wish to show thanks to you just for bailing me out of this particular course.

sap ps training in bangalore

ReplyDelete
Replies

Add comment

Share What You Know, Learn What You Don't

Pages

Wednesday, 14 August 2013

Configuring Solr 4 Data Import Handler with JDBC

Preparation

Set-Up

21 comments: