tokenize - Why is my leading wildcard search failing in Solr? -
i have text field defined utilize copyfield fill various source fields, , goal 1 field utilize search solr index.
this text field defined utilize custom fieldtype "text_en_splitting_reversed." created field type copying illustration "text_en_splitting" , adding reversedwildcardfilterfactory index analyzer.
<!-- text_en_splitting, add-on of reversed tokens leading wildcard matches --> <fieldtype name="text_en_splitting_reversed" class="solr.textfield" positionincrementgap="100" autogeneratephrasequeries="true"> <analyzer type="index"> <tokenizer class="solr.whitespacetokenizerfactory"/> <!-- in example, utilize synonyms @ query time <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/> --> <!-- case insensitive stop word removal. add together enablepositionincrements=true in both index , query analyzers leave 'gap' more accurate phrase queries. --> <filter class="solr.stopfilterfactory" ignorecase="true" words="lang/stopwords_en.txt" enablepositionincrements="true" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="0" splitoncasechange="1" types="word-delim-types.txt" /> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.porterstemfilterfactory"/> <filter class="solr.reversedwildcardfilterfactory" withoriginal="true" maxposasterisk="3" maxposquestion="2" maxfractionasterisk="0.33"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="lang/stopwords_en.txt" enablepositionincrements="true" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="0" catenatenumbers="0" catenateall="0" splitoncasechange="1" types="word-delim-types.txt" /> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.porterstemfilterfactory"/> </analyzer> </fieldtype>
my primary problem: when search using leading wildcard, unexpected results. example, know 1 particular search i'm doing "*car" should homecoming single match (the document contains word "racecar"). since failing, decided debug in analyzer tool in solr admin. here screenshot of test:
i'm new analyzer tool, shouldn't right side have retained leading asterisk way down? , why doesn't end matching? expected reverse processing of user's entered keywords?
now, in index query config, set utilize edismax. however, in admin analyzer gui, don't see way command whether it's using standard parser or edismax. (perhaps doesn't matter?)
in case info may help provide more context, going run downwards goals particular field beingness indexed:
i *car match racecar. this not working. i $30 match documents containing $30, not containing 30 (without dollar sign preceeding). added types="" attribute define $ digit. is working. i 30 match documents containing $30. this not working.
from screen shot, clear worddelimiterfilterfactory has stripped off leading *. seek adding preserveoriginal="1"
query analyzer side i.e.
<filter class="solr.worddelimiterfilterfactory" preserveoriginal="1" generatewordparts="1" generatenumberparts="1" catenatewords="0" catenatenumbers="0" catenateall="0" splitoncasechange="1" types="word-delim-types.txt" />
solr tokenize lucene
No comments:
Post a Comment