c# - Replace with Regex -
in our application, user entering info ms word asp.net textarea command , info saved in sql server. reason, there few junk characters looks little squares when viewed sql server management studio.
this causing error while generating crystal reports.
i need regex strip such characters along bullets. valid input
a-z, a-z , 0-9, ~ ! @ # % $ ^ & * ( ) _ + | ` - = \ {}:">? < [ ] ; ' , . /
also, tab spaces should replaced single space. come in key or new line allowed.
currently using
regex.replace(data, @"[^\u0000-\u007f]", " ");
but won't work remove bullets or tab spaces.
can regex ninja help me problem? in advance.
you can utilize 2 regexes. first, pattern "\t|<bullet>"
(where <bullet>
stands representation of bullet) used first, replace tabs , bullets spaces (" "
). second, pattern of negated character set containing list of valid characters, used second, replace invalid characters empty string (""
), is, rid of them. since need maintain cr , lf characters (and space), these must added set of valid characters:
using system; using system.text.regularexpressions; static class programme { public static void main() { string pattern1 = @"\t"; regex regex1 = new regex(pattern1, regexoptions.compiled); string pattern2 = @"[^a-za-z0-9~!#$^&*()_+|`\-=\\{}:"">?<\[\];',./ \r\n]"; regex regex2 = new regex(pattern2, regexoptions.compiled); string input = "abzabz09~!#$^&*()_+|`-=\\{}:\">?<[];',./ \r\nárvíztűrő\ttükörfúrógép"; string temp = regex1.replace(input, " "); string output = regex2.replace(temp, ""); console.writeline(input); console.writeline(output); console.readkey(true); } }
output:
abzabz09~!#$^&*()_+|`-=\{}:">?<[];',./ árvíztűrő tükörfúrógép abzabz09~!#$^&*()_+|`-=\{}:">?<[];',./ rvztr tkrfrgp
note tab after árvíztűrő
replaced single space.
about bullets:
i made bulleted list in word , copied textarea in webpage. saved html , figured out bullets saved utf-8-encoded character e280a2
. called above "representation of bullet". should figure out binary representation of possible bullet characters , add together them first pattern: either or them tab character, or set of them character set:
using system; using system.text; using system.text.regularexpressions; static class programme { public static void main() { byte[] bulletbytes = new byte[] { 0xe2, 0x80, 0xa2 }; string bullet= encoding.utf8.getstring(bulletbytes); string pattern1 = @"[\t" + bullet + "]"; regex regex1 = new regex(pattern1, regexoptions.compiled); string pattern2 = @"[^a-za-z0-9~!#$^&*()_+|`\-=\\{}:"">?<\[\];',./ \r\n]"; regex regex2 = new regex(pattern2, regexoptions.compiled); string input = bullet + "abzabz09~!#$^&*()_+|`-=\\{}:\">?<[];',./ \r\n" + bullet + "árvíztűrő\ttükörfúrógép"; string temp = regex1.replace(input, " "); string output = regex2.replace(temp, ""); console.outputencoding = encoding.utf8; console.writeline(input); console.writeline(output); console.readkey(true); } }
output (you should alter console font lucida console see bullet):
•abzabz09~!#$^&*()_+|`-=\{}:">?<[];',./ •árvíztűrő tükörfúrógép abzabz09~!#$^&*()_+|`-=\{}:">?<[];',./ rvztr tkrfrgp
now in add-on tab, bullet @ origin of each line has been replaced space.
c# regex
No comments:
Post a Comment