java - ReadLine and encoding of the extended ascii table -
good day.
i have ascii file spanish words. contain characters between , z, plus Ñ, ascii code 165 (http://www.asciitable.com/). file source code:
inputstream = ctx.getassets().open(filenames[lang_code][w]); inputstreamreader reader1 = new inputstreamreader(is, "utf-8"); bufferedreader reader = new bufferedreader(reader1, 8000); seek { while ((line = reader.readline()) != null) { workon(line); // lot of things line } reader.close(); is.close(); } grab (ioexception e) { e.printstacktrace(); }
what here called workon() function should extract characters codes strings , that:
private static void workon(string s) { byte b; (int w = 0; w < s.length(); w++) { b = (byte)s.charat(w); // etc etc etc } }
unfortunately happens here cannot identify b ascii code when represents Ñ letter. value of b right ascii letter, , returns -3 when dealing Ñ, that, brought signed, 253, or ascii character ². nil similar Ñ...
what happens here? how should simple ascii code?
what getting me mad cannot find right coding. even, if go , browse utf-8 table (http://www.utf8-chartable.de/) Ñ 209dec , 253dec ý, 165dec ¥. again, not event relatives need.
so... help me please! :(
are sure source file reading utf-8 encoded? in utf-8 encoding, values greater 127 reserved multi-byte sequence, , never seen standing on own.
my guess file reading encoded using "code page 237" original ibm pc character set. in character set, Ñ represented decimal 165.
many modern systems utilize iso-8859-1, happen equivalent first 256 characters of unicode character set. in those, Ñ character decimal 209. in comment, author clarified 209 in file.
if file utf-8 encoded, Ñ represented two-byte sequence, , neither value 165 nor value 209.
based on above assumption file iso-8859-1 encoded, should able solve situation using:
inputstreamreader reader1 = new inputstreamreader(is, "iso-8859-1");
this translate unicode characters, , should find character Ñ represented decimal 209.
java android
No comments:
Post a Comment