Preparation
You should have set-up Solr on Tomcat along with Tika's extracting request handler as shown in the previous two guides:
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up-tika-extracting-request.html
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up-tika-extracting-request.html
Set-Up
- Firstly we need the libraries that are required to use Data Import Handler. Create a folder and name it dih (preferably in your $SOLR_HOME), and place solr-dataimporthandler-4.0.0.jar and solr-dataimporthandler-extras-4.0.0.jar from $SOLR/dist directory in the dih folder.
Add this to the solrconfig.xml file:
<lib dir="../../dih" regex=".*\.jar" />
- Now we modify the solrconfig.xml file. Add the following :
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
- Create a db-data-config.xml. This is for the Data Import Handler configuration. It should look similar to:
<dataConfig>
<dataSource driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:1234/users" user="users"
password="secret" />
<document>
<entity name="user" query="SELECT id, name from users">
<field column="id" name="id" />
<field column="name" name="name" />
<entity name="birthday" query="select birthday from table where id=${user.id}">
<field column="description" name="description" />
</entity>
</entity>
</document>
</dataConfig></dataConfig>
- For other database engines enter the specific driver, url, user and password attributes.
- We now need to modify the fields section of the schema.xml file to something like the following snippet:
<field name="id" type="string" indexed="true" stored="true"
required="true"/>
<field name="name" type="text" indexed="true" stored="true" />
<field name="birthday" type="text" indexed="true" stored="true"/>
<field name="description" type="" indexed="true"
stored="true"/>
Solr may not like type="text", if so then change it to text_general
One more thing before the indexing – you should copy an appropriate JDBC driver to the $SOLR_HOME/lib directory of your Solr installation or the dih directory we created before. You can get the libraries from the databases websites
To index, run the following query: http://localhost:8080/solr/dataimport?command=full-import .The HTTP protocol is asynchronous so you won't be updated on the status of the indexing process. To check the status of the indexing process, you can run the command once again.
This comment has been removed by the author.
ReplyDeleteThanks...
ReplyDeleteHello Allan Macmillan,
ReplyDeletethanx for the nice help by the tutorial,
I follows all the above instruction for data import mysql to solr but now i am getting two error in admin logging
1)
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 and
2)
Exception while processing: empinfo document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from empinfo; Processing Document # 1 .
these error generate during the time of import data from mysql to solr. I also debug and message show there is
{
"responseHeader": {
"status": 0,
"QTime": 47
},
"initArgs": [
"defaults",
[
"config",
"data-config.xml"
]
],
"command": "full-import",
"mode": "debug",
"documents": [],
"verbose-output": [],
"status": "idle",
"importResponse": "",
"statusMessages": {
"Time Elapsed": "0:0:0.29",
"Total Requests made to DataSource": "1",
"Total Rows Fetched": "0",
"Total Documents Processed": "0",
"Total Documents Skipped": "0",
"Full Dump Started": "2015-10-04 15:11:43",
"Full Import failed": "2015-10-04 15:11:43"
},
"WARNING": "This response format is experimental. It is likely to change in the future."
}.
now plz help me. thanx in advance.
Java SE & Java EE article is practical oriented and real time examples. How Java EE address the enterprise development is very important. for that you need a practical orieneted Java Training Courses you need.
ReplyDeleteGreat Article
Online Java Training
Online Java Training
Java Training Institutes in Chennai
J2EE training
Java Training in Chennai
Java Interview Questions
Best Recommended books for Spring framework
I am able to read the multiple pdf file in Solr with apache-tika, but i need to clean the content inside the text column.
ReplyDeleteLet's say, I have data inside text like
text": [
"SAW",
"http://www.google.com/km/saw/print.do?docId=emr_na1[5/27/2016 4:02:01 PM]",
" Writer Id: {e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a} ",
" Writer Instance Id: {2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb} ",
" State: [1] Stable ",
"Provider name: '3PAR VSS Provider' "]
I need to clean the junk, and the output should be like:
SAW
http://www.google.com/km/saw/print.do?docId=emr_na1
Writer Id e38c2e3c-d4fb-4f4d-9550-fcafda8aae9a
Writer Instance Id: 2991d6df-8386-4fd7-98e9-b5a9bf8fdcfb
State: [1] Stable
Provider name: 3PAR VSS Provider
This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot.
ReplyDeleteDevops Training courses
Devops Training in Bangalore
Best Devops Training in pune
Microsoft azure training in Bangalore
I appreciate that you produced this wonderful article to help us get more knowledge about this topic.
ReplyDeleteI know, it is not an easy task to write such a big article in one day, I've tried that and I've failed. But, here you are, trying the big task and finishing it off and getting good comments and ratings. That is one hell of a job done!
Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Selenium interview questions and answers
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
ReplyDeletepython training in chennai
Python Online training in usa
python course institute in chennai
All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
ReplyDeleteDevops Training in Chennai | Devops Training Institute in Chennai
Really i appreciate the effort you made to share the knowledge. The topic here i found was really effective...
ReplyDeleteGet Web Methods Training in Bangalore from Real Time Industry Experts with 100% Placement Assistance in MNC Companies. Book your Free Demo with Softgen Infotech.
ReplyDeleteI feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it. DevOps Training | Certification in Chennai | DevOps Training | Certification in anna nagar | DevOps Training | Certification in omr | DevOps Training | Certification in porur | DevOps Training | Certification in tambaram | DevOps Training | Certification in velachery
Very thank you for sharing the code.
ReplyDeleteartificial intelligence course in indore
I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
ReplyDeleteAWS training in Chennai
AWS Online Training in Chennai
AWS training in Bangalore
AWS training in Hyderabad
AWS training in Coimbatore
AWS training
It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command
ReplyDeletejava training in chennai
java training in omr
aws training in chennai
aws training in omr
python training in chennai
python training in omr
selenium training in chennai
selenium training in omr
Thanks a lot for sharing such a good source with all, i appreciate your efforts taken for the same. I found this worth sharing and must share this with all.
ReplyDeletejava training in chennai
java training in tambaram
aws training in chennai
aws training in tambaram
python training in chennai
python training in tambaram
selenium training in chennai
selenium training in tambaram
I am glad it is helpful. Hopefully I will be able to release a newer version this month with some updates and improvements.
ReplyDeleteoracle training in chennai
oracle training in velachery
oracle dba training in chennai
oracle dba training in velachery
ccna training in chennai
ccna training in velachery
seo training in chennai
seo training in velachery
excellent information. It is amazing and wonderful to visit your site.
ReplyDeleteangular js training in chennai
angular js training in annanagar
full stack training in chennai
full stack training in annanagar
php training in chennai
php training in annanagar
photoshop training in chennai
photoshop training in annanagar
Thanks for your informative article,Your post helped me to understand the future and career prospects &
ReplyDeleteKeep on updating your blog .
acte chennai
acte complaints
acte reviews
acte trainer complaints
acte trainer reviews
acte velachery reviews complaints
acte tambaram reviews complaints
acte anna nagar reviews complaints
acte porur reviews complaints
acte omr reviews complaints
This was nice and amazing and the given contents were very useful and the precision has given here is good.
ReplyDeleteApache Spark Training in Pune
Python Classes in Pune
That's really impressive and helpful information you have given, very valuable content.
ReplyDeleteWe are also into education and you also can take advantage really awesome job oriented courses
I wish to show thanks to you just for bailing me out of this particular course.
ReplyDeletesap ps training in bangalore