bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets#9923
Closed
ausaki wants to merge 2 commits intopython:mainfrom
Closed
bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets#9923ausaki wants to merge 2 commits intopython:mainfrom
ausaki wants to merge 2 commits intopython:mainfrom
Conversation
…d by function decoding_fgets
Member
|
Closing the PR following the closure of its b.p.o issue (https://bugs.python.org/issue34979) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please see issue-34979 for the details.
Maybe all versions greater than Python3 are affected.
How to reproduce this issue
How did it happen?
function
deocding_fgetsread one line of raw bytes into a buffer, this buffer's size is platform independent, for example, it equals 1024 on macOS.if the line is too long(like greater than 1023 bytes), maybe the line will be cut in the middle of a multibyte UTF-8 character, then cause function
valid_utf8failed.How to fix this issue?
There is no need to check whether the encoding of the line is utf-8 or not.
If we can't find the coding spec at the top of the source file , then try to set default encoding to utf-8 and always use function
fp_readlto read a line.https://bugs.python.org/issue34979