Implement latin_1 in Rust by fanninpm · Pull Request #3046 · RustPython/RustPython

fanninpm · 2021-09-13T02:10:39Z

~~Based on the PyPy implementation. The encoding function needs some work with respect to error handling.~~ EDIT: Now based on @coolreader18's ascii codec.

Requesting @coolreader18 as a potential reviewer, as they wrote the architecture for the encodings module and the error handling function(s).

coolreader18

Oh yea, so far this looks good! That is the correct way to add a new encoding; if that's what you were looking for. I'm not sure if unicode_encode_ucs1 is reusable for other encodings, since anything >ucs1 would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.

common/src/encodings.rs

fanninpm · 2021-09-14T15:33:11Z

I'm not sure if unicode_encode_ucs1 is reusable for other encodings, since anything >ucs1 would be some way of splitting the ucs2/ucs4 into parts, at which point it's just a different encoding.

unicode_encode_ucs1 is applicable to latin_1 and ascii, so that's why I put that function outside the module.

That is the correct way to add a new encoding; if that's what you were looking for.

While I was implementing this, I came across something that stymied me. In PyPy, the unicode_encode_ucs1 function references an error handler that is not yet implemented here. How would that error handler be implemented in Rust?

coolreader18 · 2021-09-22T21:32:10Z

Originally posted this in #3118 but it makes more sense here

re: the ucs1 helper function, I think it makes more sense to abstract over utf8/ascii than ascii/latin1, since we use a utf8 str type (so we can just validate + extend_from_slice for utf8/ascii) whereas CPython has UCS1 as a buffer (so they can just validate + memcpy for ascii/latin1). Finishing up #3118 made me remember why I did that; in our implementation of codecs, ascii and utf8 just naturally have a lot of shared code while the alternative is true in CPython.

This implementation is patterned off of the ascii codec.

common/src/encodings.rs

youknowone

looks good to me. @coolreader18 could you review this again?

youknowone requested a review from coolreader18 September 13, 2021 16:06

coolreader18 reviewed Sep 14, 2021

View reviewed changes

common/src/encodings.rs Outdated Show resolved Hide resolved

fanninpm force-pushed the latin-1-encoding branch from 643eb35 to 4b8e8a0 Compare September 15, 2021 23:48

coolreader18 mentioned this pull request Sep 22, 2021

Implement ascii codec in Rust #3118

Merged

Implement latin_1 in Rust

0f889ce

This implementation is patterned off of the ascii codec.

fanninpm force-pushed the latin-1-encoding branch from edee586 to 0f889ce Compare September 24, 2021 01:34

fanninpm marked this pull request as ready for review September 24, 2021 01:36

Simplify latin_1 decode function

7d322b7

coolreader18 reviewed Sep 24, 2021

View reviewed changes

common/src/encodings.rs Show resolved Hide resolved

fanninpm added 2 commits September 24, 2021 18:56

Streamline latin_1_encode fast path

4375307

Account for 0x80..=0xff range in memory

a100178

youknowone approved these changes Sep 29, 2021

View reviewed changes

youknowone merged commit 2eb6c68 into RustPython:main Oct 1, 2021

fanninpm deleted the latin-1-encoding branch October 1, 2021 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement latin_1 in Rust#3046

Implement latin_1 in Rust#3046
youknowone merged 4 commits intoRustPython:mainfrom
fanninpm:latin-1-encoding

fanninpm commented Sep 13, 2021 •

edited

Loading

Uh oh!

coolreader18 left a comment

Uh oh!

Uh oh!

fanninpm commented Sep 14, 2021

Uh oh!

coolreader18 commented Sep 22, 2021 •

edited

Loading

Uh oh!

Uh oh!

youknowone left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fanninpm commented Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coolreader18 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fanninpm commented Sep 14, 2021

Uh oh!

coolreader18 commented Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fanninpm commented Sep 13, 2021 •

edited

Loading

coolreader18 commented Sep 22, 2021 •

edited

Loading