bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets by ausaki · Pull Request #9923 · python/cpython

ausaki · 2018-10-17T08:49:48Z

Please see issue-34979 for the details.

Maybe all versions greater than Python3 are affected.

How to reproduce this issue

save the following source code into a file with utf8.

# demo.py
s = '测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试'

run it.

$ python3 -V
Python 3.6.4

$ python3 demo.py
  File "demo.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xe8' in file demo.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

How did it happen?

function deocding_fgets read one line of raw bytes into a buffer, this buffer's size is platform independent, for example, it equals 1024 on macOS.

if the line is too long(like greater than 1023 bytes), maybe the line will be cut in the middle of a multibyte UTF-8 character, then cause function valid_utf8 failed.

How to fix this issue?

There is no need to check whether the encoding of the line is utf-8 or not.
If we can't find the coding spec at the top of the source file , then try to set default encoding to utf-8 and always use function fp_readl to read a line.

https://bugs.python.org/issue34979

…d by function decoding_fgets

iritkatriel · 2021-05-29T15:41:46Z

Closing the PR following the closure of its b.p.o issue (https://bugs.python.org/issue34979)

bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." cause…

27735c2

…d by function decoding_fgets

the-knights-who-say-ni added the CLA signed label Oct 17, 2018

bedevere-bot added the awaiting review label Oct 17, 2018

zhangyangyu self-requested a review October 17, 2018 08:51

bpo-34979: use PyExc_OSError instead of PyExc_SyntaxError.

334327e

iritkatriel closed this May 29, 2021

ausaki mannequin mentioned this pull request Apr 10, 2022

Python throws “SyntaxError: Non-UTF-8 code start with \xe8...” when parse source file #79160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets#9923

bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets#9923
ausaki wants to merge 2 commits intopython:mainfrom
ausaki:fix_bpo_34979

ausaki commented Oct 17, 2018 •

edited by bedevere-bot

Loading

Uh oh!

iritkatriel commented May 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ausaki commented Oct 17, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to reproduce this issue

How did it happen?

How to fix this issue?

Uh oh!

iritkatriel commented May 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ausaki commented Oct 17, 2018 •

edited by bedevere-bot

Loading