Encoding error using Python -
i wrote code connect imap , parse body info , insert database. having problems accents.
from email header got information:
content-type: text/html; charset=iso-8859-1
but, not sure if can trust in information...
the email wrote in portuguese, have lot of words accents. example, extract next phrase email source code (using browser):
"...instalação de eletrônicos..."
so, connected imap , fetched emails:
... typ, info = m.fetch(num, '(rfc822)') ...
when print content, next word:
print data[0][1]
instala+º+úo de eletr+¦nicos
i tried utilize .decode('utf-8')
had no success.
instalação de eletrônicos
how can create human readable? database in utf-8.
the header says using "iso-8859-1" charset. need decode string encoding.
try this:
data[0][1].decode('iso-8859-1')
python encoding character-encoding non-ascii-characters
No comments:
Post a Comment