Tuesday, 15 July 2014

How do I find documents containing digits and dollar signs in Solr? -



How do I find documents containing digits and dollar signs in Solr? -

in solr, i've got text contains $30 , 30.

i search $30 , find documents containing $30.

but if searches 30, should find both documents containing $30 , containing 30.

here field type i'm using index text field:

<!-- text_en_splitting, add-on of reversed tokens leading wildcard matches --> <fieldtype name="text_en_splitting_reversed" class="solr.textfield" positionincrementgap="100" autogeneratephrasequeries="true"> <analyzer type="index"> <tokenizer class="solr.whitespacetokenizerfactory"/> <!-- in example, utilize synonyms @ query time <filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/> --> <!-- case insensitive stop word removal. add together enablepositionincrements=true in both index , query analyzers leave 'gap' more accurate phrase queries. --> <filter class="solr.stopfilterfactory" ignorecase="true" words="lang/stopwords_en.txt" enablepositionincrements="true" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="0" splitoncasechange="1" types="word-delim-types.txt" /> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.porterstemfilterfactory"/> <filter class="solr.reversedwildcardfilterfactory" withoriginal="true" maxposasterisk="3" maxposquestion="2" maxfractionasterisk="0.33"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="lang/stopwords_en.txt" enablepositionincrements="true" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="0" catenatenumbers="0" catenateall="0" splitoncasechange="1" types="word-delim-types.txt" /> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.keywordmarkerfilterfactory" protected="protwords.txt"/> <filter class="solr.porterstemfilterfactory"/> </analyzer> </fieldtype>

i have defined word-delim-types.txt contain:

$ => digit % => digit . => digit

so when search $30, correctly locates documents containing "$30" not containing "30". that's good. when search "30" not find documents containing "$30", containing "30".

is there way this?

i have found solution question. instead of defining $ % , . digit, define them alpha, in "types" file passed in attribute worddelimiterfilterfactory.

$ => alpha % => alpha . => alpha

due rest of worddelimiterfilterfactory settings, things broken , catenated in way desired effect achieved:

searching $30 yields documents containing $30. searching 30 yields documents containing both $30 , 30.

solr

No comments:

Post a Comment