This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2008-11-21.12:32:16
SpamBayes Score 4.6242454e-12
Marked as misclassified No
Message-id <[email protected]>
In-reply-to
Content
I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead 
of IDLE builtin charset detection, I tried to use 
tokenize.detect_encoding() but this function doesn't work with script 
using Mac new line (b"\r").

Code to detect the encoding of a Python script:
----
def pythonEncoding(filename):
   with open(filename, 'rb') as fp:
      encoding, lines = detect_encoding(fp.readline)
   return encoding
----

Example to reproduce the problem with Mac script:
----
fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 
amie")\r')
encoding, lines = detect_encoding(fp.readline)
print(encoding, lines)
----

=> Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 
amie")\r']

The problem occurs at "line_string = line.decode('ascii')". 
Since "line" contains a non-ASCII character (b"\xe8"), the conversion 
fails.
History
Date User Action Args
2008-11-21 12:32:21vstinnersetrecipients: + vstinner
2008-11-21 12:32:19vstinnersetmessageid: <[email protected]>
2008-11-21 12:32:17vstinnerlinkissue4377 messages
2008-11-21 12:32:16vstinnercreate