Thursday, 15 July 2010

Java & Regex: Matching a substring that is not preceded by specific characters -



Java & Regex: Matching a substring that is not preceded by specific characters -

this 1 of questions has been asked , answered hundreds of times over, i'm having hard time adapting other solutions needs.

in java-application have method censoring bad words in chat messages. works of words, there 1 particular (and popular) curse word can't seem rid of. word "faen" (which modern slang "satan", in language in question).

using pattern "fa+e+n" matching multiple a's , e's works; however, in language, word "that couch" or "that sofa" "sofaen". i've tried lot of different approaches, using variations of [^so] , (?!=so), far haven't been able find way match 1 , not other.

the real goal here, able match bad words, regardless of number of vowels, , regardless of non-letters in between components of word.

here's few examples of i'm trying do:

"string containing faen" should match "string containing sofaen" should not match "non-letter-censored string f-a@a-e.n" should match "non-letter-censored string sof-a@a-e.n" should not match

any tips set me off in right direction on this?

you want \bf[^\s]+a[^\s]+e[^\s]+n[^\s]\b. note regular expression; if want java need utilize \\b[^\\s]+f[^\\s]+a[^\\s]+e[^\\s]+n[^\\s]\b.

note isn't perfect, handle situations have suggested.

java regex

No comments:

Post a Comment