Friday, 15 February 2013

solr - unable to configure Tika1.2 with solr4 -



solr - unable to configure Tika1.2 with solr4 -

i trying utilize tikaentityprocessor index .html file content. somehow not able correctly. have checked error log , got next error.

severe: total import failed:java.lang.runtimeexception:org.apache.solr.handler.dataimport.dataimporthandlerexception: unable load entityprocessor implementation entity:tika-test processing document # 1 @ org.apache.solr.handler.dataimport.docbuilder.execute(docbuilder.java:273) @ org.apache.solr.handler.dataimport.dataimporter.dofullimport(dataimporter.java:382) @ org.apache.solr.handler.dataimport.dataimporter.runcmd(dataimporter.java:448) @ org.apache.solr.handler.dataimport.dataimporter$1.run(dataimporter.java:429) caused by: org.apache.solr.handler.dataimport.dataimporthandlerexception: unable load entityprocessor implementation entity:tika-test processing document # 1 @ org.apache.solr.handler.dataimport.dataimporthandlerexception.wrapandthrow(dataimporthandlerexception.java:71) @ org.apache.solr.handler.dataimport.docbuilder.getentityprocessorwrapper(docbuilder.java:697) @ org.apache.solr.handler.dataimport.docbuilder.getentityprocessorwrapper(docbuilder.java:703) @ org.apache.solr.handler.dataimport.docbuilder.execute(docbuilder.java:215) ... 3 more caused by: java.lang.classnotfoundexception: unable load tikaentityprocessor or org.apache.solr.handler.dataimport.tikaentityprocessor @ org.apache.solr.handler.dataimport.docbuilder.loadclass(docbuilder.java:899) @ org.apache.solr.handler.dataimport.docbuilder.getentityprocessorwrapper(docbuilder.java:694) ... 5 more caused by: org.apache.solr.common.solrexception: error loading class 'tikaentityprocessor' @ org.apache.solr.core.solrresourceloader.findclass(solrresourceloader.java:436) @ org.apache.solr.handler.dataimport.docbuilder.loadclass(docbuilder.java:889) ... 6 more caused by: java.lang.classnotfoundexception: tikaentityprocessor @ java.net.urlclassloader$1.run(urlclassloader.java:217)

my data-config.xml file follow:

<dataconfig> <datasource type="binfiledatasource" /> <document> <entity name="f" processor="filelistentityprocessor" basedir="path/to/basedir/" filename=".*html" recursive="true" rootentity="true" datasource="null"> <entity name="tika-test" processor="tikaentityprocessor" url="path/tohtml/files/" format="text" onerror="skip"> <field column="product_id" name="product_id" meta="true"/> <field column="type" name="type" meta="true"/> <field column="title" name="title" meta="true"/> </entity> </entity> </document> </dataconfig>

i have added next code in solrconfig.xml

<requesthandler name="/dataimport" class="org.apache.solr.handler.dataimport.dataimporthandler"> <lst name="defaults"> <str name="config">/path/to/data-config.xml</str> </lst>

i have kept default schema.xml file , added next code in file.

<field name="product_id" type="string" indexed="true" stored="true"/> <field name="title" type="string" indexed="true" stored="true"/> <field name="type" type="string" indexed="true" stored="true"/>

can please tell me missing here? or why errors? , what's it's solution.

did add together lib directives in solrconfig.xml create sure tika libraries loaded? need (i believe):

<lib dir="${user.dir}/../dist/" regex="solr-cell-\d.*\.jar" /> <lib dir="${user.dir}/../contrib/extraction/lib" regex=".*\.jar" />

if using solr 4, not solr 4.1, may need apache-solr-cell.... instead of solr-cell...

solr apache-tika dataimporthandler solr4

No comments:

Post a Comment