Hee: java - ReadLine and encoding of the extended ascii table -

Friday, 15 August 2014

java - ReadLine and encoding of the extended ascii table -

good day.

i have ascii file spanish words. contain characters between , z, plus Ñ, ascii code 165 (http://www.asciitable.com/). file source code:

inputstream = ctx.getassets().open(filenames[lang_code][w]); inputstreamreader reader1 = new inputstreamreader(is, "utf-8"); bufferedreader reader = new bufferedreader(reader1, 8000);   seek {     while ((line = reader.readline()) != null) {                  workon(line);                  // lot of things line             }     reader.close();     is.close(); }  grab (ioexception e) { e.printstacktrace(); }

what here called workon() function should extract characters codes strings , that:

    private static void workon(string s) {           byte b;     (int w = 0; w < s.length(); w++) {         b = (byte)s.charat(w);                     // etc etc etc             } }

unfortunately happens here cannot identify b ascii code when represents Ñ letter. value of b right ascii letter, , returns -3 when dealing Ñ, that, brought signed, 253, or ascii character ². nil similar Ñ...

what happens here? how should simple ascii code?

what getting me mad cannot find right coding. even, if go , browse utf-8 table (http://www.utf8-chartable.de/) Ñ 209dec , 253dec ý, 165dec ¥. again, not event relatives need.

so... help me please! :(

are sure source file reading utf-8 encoded? in utf-8 encoding, values greater 127 reserved multi-byte sequence, , never seen standing on own.

my guess file reading encoded using "code page 237" original ibm pc character set. in character set, Ñ represented decimal 165.

many modern systems utilize iso-8859-1, happen equivalent first 256 characters of unicode character set. in those, Ñ character decimal 209. in comment, author clarified 209 in file.

if file utf-8 encoded, Ñ represented two-byte sequence, , neither value 165 nor value 209.

based on above assumption file iso-8859-1 encoded, should able solve situation using:

inputstreamreader reader1 = new inputstreamreader(is, "iso-8859-1");

this translate unicode characters, , should find character Ñ represented decimal 209.

java android

Hee

Friday, 15 August 2014

java - ReadLine and encoding of the extended ascii table -

No comments:

Post a Comment