This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, piro, r.david.murray, ronaldoussoren, vstinner
Date 2010-10-22.08:58:22
SpamBayes Score 9.103829e-14
Marked as misclassified No
Message-id <[email protected]>
In-reply-to
Content
FYI, you should use ascii() instead of a.encode(\"utf8\") to dump arguments. It's easier to check '\u2603' than b'\xe2\x98\x83' for me :-)

So the bug is fixed in Python 3.2, great! I was thinking that we need a test for that, but then I remembered that I already wrote such test :-) My test checks 3 unicode characters: \xe9, \u20ac, \U0010ffff; but also invalid byte sequences:

text = (
  b'\xff'         # invalid byte
  b'\xc3\xa9'     # valid utf-8 character
  b'\xc3\xff'     # invalid byte sequence
  b'\xed\xa0\x80' # lone surrogate character (invalid)
)

And it should be enough :-) See test_osx_utf8() of test_cmd_line to see the whole test.
History
Date User Action Args
2010-10-22 08:58:24vstinnersetrecipients: + vstinner, ronaldoussoren, piro, ezio.melotti, r.david.murray
2010-10-22 08:58:24vstinnersetmessageid: <[email protected]>
2010-10-22 08:58:23vstinnerlinkissue9167 messages
2010-10-22 08:58:22vstinnercreate