osx - Can't decode utf-8 string in python on os x terminal.app -
I have a terminal. To accept Utf-8, and in bash I can type unicode characters, I can copy and paste, but if I can open the python and if I try to decode the Unicode then I get errors Receives:
gt; & Gt; Wtf = u '\ xe4 \ xf6 \ xfc'.decode () traceback (last most recent call): File "& lt; stdin>", line 1, & lt; Module & gt; Unicode encoder: 'ASC' codec can not encode letters in position 0-2: orderly non range (128) & gt; & Gt; & Gt; Wtf = u '\ xe4 \ xf6 \ xfc'.decode (' utf-8 ') traceback (most recent call final): File "& lt; stdin>", line 1, & lt; Module & gt; Decod returns codecs.utf_8_decode (Input, Errors, True) in the file "/ System / Library / Framework / Python. Framework / Version-2.5 / LIB / Python 2 / encodings / alias_8. Py", line 16, Unicode encode error: 'ascii "The codec can not encode characters in position 0-2: not in serial number (128) Does anyone know what I am doing wrong?
I think confusion is decoded at all places. You start with a Unicode object:
u '\ xe4 \ xf6 \ xfc' This is a unicode object, three characters are Unicode codepoints For "Äöü" if you want to convert them to UTF-8, you have to encode :
& gt; & Gt; & Gt; U '\ xe4 \ xf6 \ xfc'.encode (' utf-8 ')' \ xc3 \ xa4 \ xc3 \ xb6 \ xc3 \ xbc ' As a result there are six characters Utf "Äöü "Representing 8.
If you call decode (...) , then you try to interpret characters as some encoding, which is still in Unicode Need to convert. Since it is already unicode, it does not work. Your first call tries to convert ACIC to Unicode, another calls UTF-8 for Unicode conversion. Since u '\ xe4 \ xf6 \ xfc' is not a valid ASCI nor valid UTF-8 conversion attempts are unsuccessful.
The illusion of this may come from the fact that \ xe4 \ xf6 \ xfc ' is also the encoding of Latin 1 / ISO-8859-1 "AAUU". If you type a normal python string (without marking "u" without marking it as unicode), you can convert it to a unicode object with decode ('latin1') :
& gt; & Gt; & Gt; '\ Xe4 \ xf6 \ xfc'.decode (' latin1 ') u' \ xe4 \ xf6 \ xfc '
Comments
Post a Comment