Tuesday, 13 August 2013

Setting Up Highlighting For Solr 4

The large search engines like Google and Bing show you a small snippet of text that often contains one or more of the keywords that have been searched for. To set this up on Solr is also very straightforward and this is a short guide on how to set it up.


Always ensure that your schema.xml file for Nutch and for Solr are identical otherwise you will encounter problems so theses changes must be applied to both.
Ensure that the content field (or whatever fields you wish to highlight) are set to stored.
  <field name="content" type="text_general" stored="true" indexed="true"/>
NOTE: For pre-solr 4.0.0 "text_general" is called "text"
You also need to ensure that these two lines are present:
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="name" type="text_general" indexed="true" stored="true" />


There do not need to be any major changes to this file but it contains a lot of features that you are able to change allowing you to tweak the higlighting functionality to your own liking.
  <!-- Highlighting defaults -->
     <str name="hl">on</str>
     <str name="hl.fl">content</str>
     <str name="hl.encoder">html</str>
     <str name="hl.simple.pre">&lt;b&gt;</str>
     <str name="">&lt;/b&gt;</str>
     <str name="f.title.hl.fragsize">0</str>
     <str name="f.title.hl.alternateField">title</str>
     <str name="">0</str>
     <str name="">name</str>
     <str name="f.content.hl.snippets">3</str>
     <str name="f.content.hl.fragsize">200</str>
     <str name="f.content.hl.alternateField">content</str>
     <str name="f.content.hl.maxAlternateFieldLength">750</str>


When you query the Solr Server and have highlighting enabled it will return to you a extra tag named highlighting. The next name tag will match up with the id of the documents and can be easily matched using software like xPath.
  <lst name="highlighting">
    <lst name="file:/C:/Users/alamil/Documents/TextFiles/a.doc">
       <arr name="content">
        <str>Budget and Council Tax  POLICY AND RESOURCES COMMITTEE  BUDGET<em>STRATEGY</em></str>
    <lst name="file:/C:/Users/alamil/Documents/TextFilesb.doc">
      <arr name="content">
        <str>CONTENTS Introduction Customer Care Standards <em>Strategy</em></str>
The most common highlighting parameters available to the user are:
hl=true: If you want highlighting this must ALWAYS be true. Any blank, missing or "false" value disables highlighting feature.
hl.fl=content: Enables highlighting in that field, by default you are probably going to want to use content but other field can also be selected.
hl.snippets=5: It accepts a number as value, the specified numeric value decides the number of highlighted snipets to be returned in a query respense. The default value is 1.
hl.requireFieldMatch: It accept a true or false value as parameter, the highlighted response is returned only if the keyword is found in requied field.
The default value is "false".
hl.maxAnalyzedChars: It decides, how many characters into a document should be considered for highlighting.The default value is "51200". 

No comments:

Post a Comment