Sunday, 15 May 2011

regex - MySQL REGEXP not producing expected results (not multi byte safe?). Is there a work around? -



regex - MySQL REGEXP not producing expected results (not multi byte safe?). Is there a work around? -

i'm trying write mysql query identify first name fields contain initials. problem query picking records should not match.

i have tested against posix ere regex implementation in regex buddy confirm regex string correct, when running in mysql query, results differ.

for example, query should identify strings such as:

'a.j.d' or 'a j d'.

but matching strings 'ralph' or 'terrance'.

the query:

select *, firstname regexp '^[a-za-z]{1}(([[:space:]]|\.)+[a-za-z]{1})+([[:space:]]|\.)?$' test_table

the 'firstname' field here varchar 255 if that's relevant.

i same result when running string literal rather table data:

select 'ralph' regexp '^[a-za-z]{1}(([[:space:]]|\.)+[a-za-z]{1})+([[:space:]]|\.)?$'

the mysql documentation warns potential issues regexp, i'm unsure if related problem i'm seeing:

warning regexp , rlike operators work in byte-wise fashion, not multi-byte safe , may produce unexpected results multi-byte character sets. in addition, these operators compare characters byte values , accented characters may not compare equal if given collation treats them equal.

thanks in advance.

if you're testing in mysql client, need escape backslashes. each occurence of \. must turn \\. necessary because input first processed mysql client, turns \. .. need create maintain backslashes escaping them.

mysql regex

No comments:

Post a Comment