Sunday, 15 August 2010

android - FTS3 and FTS4 matching of :, -, and _ characters -



android - FTS3 and FTS4 matching of :, -, and _ characters -

i'm seeing weird behaviour on fts enabled sqlite database. have table named fingerprints contains column named scan. entries of scan long strings this:

00:13:10:d5:69:88_-58;0c:85:25:68:b4:30_-75;0c:85:25:68:b4:34_-76;0c:85:25:68:b4:33_-76;0c:85:25:68:b4:31_-76;0c:85:25:68:b4:35_-76;00:23:eb:ad:f6:00_-87; etc

it represent mac addresses , signal strengths. want string matching on table , seek match instance mac address:

select _id fingerprints scan match "00:13:10:d5:69:88";

this returns lot of rows not have specified string in reason. sec thing seek match is

select _id fingerprints scan match "00:13:10:d5:69:88_-58";

this returns same rows has before , wrong.

does sqlite treats : _ - characters in special way?

thanks

what you're seeing effect of fts tokenizing data.

the total text search doesn't work on un-processed long strings, splits info (and search terms) words , indexes them individually. default tokenizer uses alphanumeric characters , characters code point >128 words, , uses rest of characters (for example, you're seeing : _ -) word boundaries.

in other words, search 00:13:10:d5:69:88 search rows containing words 00 , 13 , 10 , d5 , 69 , 88 in any order.

you can verify behavior;

sqlite> create virtual table simple using fts3(tokenize=simple); sqlite> insert simple values('00:13:10:d5:69:88'); sqlite> select * simple simple match '69:10'; -> 00:13:10:d5:69:88

edit: apparently sqlite smarter gave credit for, can utilize phrase queries (scroll downwards page link destination) word sequences, solve problem. phrase queries specified enclosing space (or other word separator) separated sequence of terms in double quotes (").

sqlite> select * simple simple match '"69:10"'; -> no match sqlite> select * simple simple match '"69 88"'; -> 00:13:10:d5:69:88 sqlite> select * simple simple match '"69:88"'; -> 00:13:10:d5:69:88

android sqlite full-text-search match

No comments:

Post a Comment