Skip to content

Add UTF-32 functions#5800

Merged
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:utf32
Feb 2, 2026
Merged

Add UTF-32 functions#5800
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:utf32

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jun 6, 2025

Summary by CodeRabbit

  • New Features

    • Added support for UTF-32 encoding and decoding functions, including standard, little-endian, big-endian, and extended decode variants.
    • Enhanced UTF-7 and escape sequence handling with improved error detection and processing.
  • Tests

    • Updated test cases to run normally by removing expected failure markers from certain string and array tests.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 6, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • Lib/_pycodecs.py is excluded by !Lib/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

"""

Walkthrough

The changes remove @unittest.expectedFailure decorators and related comments from three test methods, allowing them to run as normal tests. Several UTF-32 encoding and decoding functions are added and exposed in the Rust _codecs module, delegating their logic to Python implementations. The _pycodecs.py module is extensively refactored and extended with full UTF-32 codec support, improved UTF-7 and escape sequence handling, and consistent error handling across codecs.

Changes

File(s) Change Summary
Lib/test/string_tests.py Removed @unittest.expectedFailure decorator and related comment from the test_subscript method.
Lib/test/test_array.py Removed @unittest.expectedFailure decorators and related comments from test_unicode and test_reverse_iterator_picking methods.
vm/src/stdlib/codecs.rs Added Python-exposed UTF-32 codec functions delegating to Python implementations in _pycodecs.
Lib/_pycodecs.py Refactored and extended codec implementations with full UTF-32 encode/decode support, improved UTF-7 and escape handling, and consistent error management.

Sequence Diagram(s)

sequenceDiagram
    participant PythonCode as Python Code
    participant Rust_Codecs as Rust _codecs module
    participant PyCodecs as _pycodecs Python module

    PythonCode->>Rust_Codecs: call utf_32_encode/decode or variants
    Rust_Codecs->>PyCodecs: delegate call to corresponding _pycodecs function
    PyCodecs-->>Rust_Codecs: return result
    Rust_Codecs-->>PythonCode: return result
Loading

Poem

In the warren where tests now hop free,
No more "expected failures" for all to see.
UTF-32 codecs join the crew,
Encoding and decoding, something new!
With every hop and every byte,
The code grows stronger, day and night.
🐇✨
"""

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
Lib/_pycodecs.py (2)

107-109: Implement MBCS codec functions

Both mbcs_decode and mbcs_encode have empty implementations. If these codecs are not supported on the current platform, consider raising NotImplementedError with an appropriate message instead of silently doing nothing.

Do you want me to help implement these MBCS codec functions or add proper error handling?

Also applies to: 332-334


1201-1203: Consider using keyword arguments for better readability

The function has 7 positional arguments which makes it hard to read at call sites. Consider making some arguments keyword-only, especially the decode boolean flag.

-def unicode_call_errorhandler(
-    errors, encoding, reason, input, startinpos, endinpos, decode=True
-):
+def unicode_call_errorhandler(
+    errors, encoding, reason, input, startinpos, endinpos, *, decode=True
+):
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 1201-1201: Too many arguments (7/5)

(R0913)


[refactor] 1201-1201: Too many positional arguments (7/5)

(R0917)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c420a2 and 0737add.

📒 Files selected for processing (4)
  • Lib/_pycodecs.py (11 hunks)
  • Lib/test/string_tests.py (0 hunks)
  • Lib/test/test_array.py (0 hunks)
  • vm/src/stdlib/codecs.rs (1 hunks)
💤 Files with no reviewable changes (2)
  • Lib/test/test_array.py
  • Lib/test/string_tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • vm/src/stdlib/codecs.rs
🧰 Additional context used
🪛 Pylint (3.3.7)
Lib/_pycodecs.py

[error] 99-99: function already defined line 96

(E0102)


[error] 112-112: function already defined line 96

(E0102)


[error] 121-121: function already defined line 96

(E0102)


[error] 132-132: function already defined line 96

(E0102)


[error] 139-139: function already defined line 96

(E0102)


[error] 146-146: function already defined line 96

(E0102)


[error] 153-153: function already defined line 96

(E0102)


[error] 160-160: function already defined line 96

(E0102)


[error] 172-172: function already defined line 96

(E0102)


[error] 179-179: function already defined line 96

(E0102)


[error] 186-186: function already defined line 96

(E0102)


[error] 194-194: function already defined line 96

(E0102)


[refactor] 206-221: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[error] 251-251: function already defined line 96

(E0102)


[error] 270-270: function already defined line 96

(E0102)


[refactor] 270-270: Too many branches (16/12)

(R0912)


[error] 318-318: function already defined line 96

(E0102)


[error] 325-325: function already defined line 96

(E0102)


[error] 344-344: function already defined line 96

(E0102)


[error] 351-351: function already defined line 96

(E0102)


[error] 358-358: function already defined line 96

(E0102)


[error] 365-365: function already defined line 96

(E0102)


[error] 372-372: function already defined line 96

(E0102)


[error] 379-379: function already defined line 96

(E0102)


[error] 391-391: function already defined line 96

(E0102)


[error] 404-404: function already defined line 96

(E0102)


[error] 411-411: function already defined line 96

(E0102)


[error] 418-418: function already defined line 96

(E0102)


[error] 425-425: function already defined line 96

(E0102)


[error] 437-437: function already defined line 96

(E0102)


[error] 449-449: function already defined line 96

(E0102)


[error] 461-461: function already defined line 96

(E0102)


[refactor] 648-657: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 668-668: Too many local variables (20/15)

(R0914)


[refactor] 668-668: Too many branches (20/12)

(R0912)


[refactor] 668-668: Too many statements (63/50)

(R0915)


[refactor] 697-697: Simplify chained comparison between the operands

(R1716)


[refactor] 774-822: Too many nested blocks (6/5)

(R1702)


[refactor] 767-767: Too many branches (15/12)

(R0912)


[refactor] 845-869: Unnecessary "elif" after "continue", remove the leading "el" from "elif"

(R1724)


[refactor] 845-845: Consider merging these comparisons with 'in' by using 'ch in (p[1], '\')'. Use a set instead if elements are hashable.

(R1714)


[refactor] 834-834: Too many branches (16/12)

(R0912)


[refactor] 932-932: Too many local variables (17/15)

(R0914)


[refactor] 932-932: Too many branches (24/12)

(R0912)


[refactor] 932-932: Too many statements (74/50)

(R0915)


[refactor] 1018-1018: Simplify chained comparison between the operands

(R1716)


[refactor] 1021-1036: Unnecessary "else" after "continue", remove the "else" and de-indent the code inside it

(R1724)


[refactor] 1021-1021: Simplify chained comparison between the operands

(R1716)


[refactor] 1055-1058: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1092-1092: Too many local variables (16/15)

(R0914)


[refactor] 1092-1092: Too many branches (17/12)

(R0912)


[refactor] 1092-1092: Too many statements (51/50)

(R0915)


[refactor] 1171-1174: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1201-1201: Too many arguments (7/5)

(R0913)


[refactor] 1201-1201: Too many positional arguments (7/5)

(R0917)


[refactor] 1214-1225: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1327-1327: Too many local variables (17/15)

(R0914)


[refactor] 1340-1461: Unnecessary "else" after "continue", remove the "else" and de-indent the code inside it

(R1724)


[refactor] 1338-1461: Too many nested blocks (7/5)

(R1702)


[refactor] 1432-1432: Simplify chained comparison between the operands

(R1716)


[refactor] 1338-1461: Too many nested blocks (6/5)

(R1702)


[refactor] 1327-1327: Too many branches (36/12)

(R0912)


[refactor] 1327-1327: Too many statements (102/50)

(R0915)


[refactor] 1574-1574: Too many branches (17/12)

(R0912)


[refactor] 1574-1574: Too many statements (51/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
Lib/_pycodecs.py (2)

760-762: Address the TODO comment in UTF-7 decoder

The comment "# XXX This aint right" suggests uncertainty about the error handling for unterminated shift sequences. Please verify this implementation is correct or fix the issue.


403-470: Well-implemented UTF-32 codec support!

The UTF-32 codec implementation is comprehensive and follows established patterns from UTF-16. The functions correctly handle byte order, BOM detection, and error handling.

Also applies to: 1092-1190

🧰 Tools
🪛 Pylint (3.3.7)

[error] 404-404: function already defined line 96

(E0102)


[error] 411-411: function already defined line 96

(E0102)


[error] 418-418: function already defined line 96

(E0102)


[error] 425-425: function already defined line 96

(E0102)


[error] 437-437: function already defined line 96

(E0102)


[error] 449-449: function already defined line 96

(E0102)


[error] 461-461: function already defined line 96

(E0102)

Lib/_pycodecs.py Outdated
Comment on lines 220 to 221
res = "You can do better than this" # XXX make this right
return res, len(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace placeholder implementation

The else branch returns a hardcoded string "You can do better than this" which appears to be a placeholder. This needs proper implementation or should raise an appropriate exception.

Consider implementing proper handling or raising an exception:

-        res = "You can do better than this"  # XXX make this right
-        return res, len(res)
+        raise TypeError(f"unicode_internal_encode() argument must be str, not {type(obj).__name__}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res = "You can do better than this" # XXX make this right
return res, len(res)
raise TypeError(f"unicode_internal_encode() argument must be str, not {type(obj).__name__}")
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 206-221: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

🤖 Prompt for AI Agents
In Lib/_pycodecs.py at lines 220-221, the else branch currently returns a
hardcoded placeholder string "You can do better than this". Replace this
placeholder with a proper implementation that handles the case correctly or
raise an appropriate exception to indicate unsupported or unhandled conditions
instead of returning a dummy string.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
Lib/_pycodecs.py (3)

926-929: Fix undefined variable reference

The function references undefined variable str instead of the parameter unistr.

Apply this fix:

-    return PyUnicode_EncodeASCII(str(unistr), len(str), None)
+    return PyUnicode_EncodeASCII(unistr, len(unistr), None)

1537-1537: Remove redundant isinstance check

The condition checks isinstance(rep, int) twice, which is redundant.

-    if isinstance(rep, int) or isinstance(rep, int):
+    if isinstance(rep, int):

220-221: Replace placeholder implementation

The else branch returns a hardcoded string "You can do better than this" which appears to be a placeholder. This needs proper implementation or should raise an appropriate exception.

Consider implementing proper handling or raising an exception:

-        res = "You can do better than this"  # XXX make this right
-        return res, len(res)
+        raise TypeError(f"unicode_internal_encode() argument must be str, not {type(obj).__name__}")
🧹 Nitpick comments (3)
Lib/_pycodecs.py (3)

404-409: Improve docstrings for UTF-32 codec functions

All the new UTF-32 codec functions have placeholder docstrings containing just "None". Consider adding meaningful docstrings that describe each function's purpose, parameters, and return values.

For example:

-def utf_32_encode(obj, errors="strict"):
-    """None"""
+def utf_32_encode(obj, errors="strict"):
+    """Encode a string using the UTF-32 codec with native byte order.
+    
+    Args:
+        obj: String to encode
+        errors: Error handling scheme ('strict', 'replace', 'ignore', etc.)
+        
+    Returns:
+        tuple: (encoded_bytes, length_consumed)
+    """

Also applies to: 411-416, 418-423, 425-435, 437-447, 449-459, 461-471


206-206: Use isinstance() for type checking

Replace direct type comparison with isinstance() for better practice.

-    if type(obj) == str:
+    if isinstance(obj, str):
-    if type(unistr) == str:
+    if isinstance(unistr, str):

Also applies to: 226-227


107-109: Consider raising NotImplementedError for unimplemented MBCS codecs

The MBCS encode/decode functions have empty implementations. If these Windows-specific codecs are not supported, consider raising NotImplementedError to make this explicit.

 def mbcs_decode():
     """None"""
-    pass
+    raise NotImplementedError("MBCS codec is not implemented")

 def mbcs_encode(obj, errors="strict"):
     """None"""
-    pass
+    raise NotImplementedError("MBCS codec is not implemented")

Also applies to: 332-335

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0737add and 50ef0d0.

📒 Files selected for processing (4)
  • Lib/_pycodecs.py (11 hunks)
  • Lib/test/string_tests.py (0 hunks)
  • Lib/test/test_array.py (1 hunks)
  • vm/src/stdlib/codecs.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • Lib/test/string_tests.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • Lib/test/test_array.py
  • vm/src/stdlib/codecs.rs
🧰 Additional context used
📓 Path-based instructions (2)
`Lib/**/*`: Files in the Lib/ directory (copied from CPython) should be edited very conservatively; modifications should be minimal and only to work around RustPython limitations.

Lib/**/*: Files in the Lib/ directory (copied from CPython) should be edited very conservatively; modifications should be minimal and only to work around RustPython limitations.

  • Lib/_pycodecs.py
`**/*.py`: Follow PEP 8 style for custom Python code. Use ruff for linting Python code.

**/*.py: Follow PEP 8 style for custom Python code.
Use ruff for linting Python code.

  • Lib/_pycodecs.py
🪛 Flake8 (7.2.0)
Lib/_pycodecs.py

[error] 49-49: 'register' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'lookup' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'lookup_error' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'register_error' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'encode' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'decode' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'utf_8_decode' may be undefined, or defined from star imports: _codecs

(F405)


[error] 49-49: 'utf_8_encode' may be undefined, or defined from star imports: _codecs

(F405)


[error] 96-96: 'from _codecs import *' used; unable to detect undefined names

(F403)


[error] 206-206: do not compare types, for exact checks use is / is not, for instance checks use isinstance()

(E721)


[error] 272-272: ambiguous variable name 'l'

(E741)


[error] 304-304: whitespace before ':'

(E203)


[error] 308-308: whitespace before ':'

(E203)


[error] 337-337: too many leading '#' for block comment

(E266)


[error] 338-338: too many leading '#' for block comment

(E266)


[error] 339-339: too many leading '#' for block comment

(E266)


[error] 340-340: too many leading '#' for block comment

(E266)


[error] 341-341: too many leading '#' for block comment

(E266)


[error] 475-475: block comment should start with '# '

(E265)


[error] 476-476: block comment should start with '# '

(E265)


[error] 478-478: too many leading '#' for block comment

(E266)


[error] 479-479: too many leading '#' for block comment

(E266)


[error] 694-694: too many leading '#' for block comment

(E266)


[error] 695-695: too many leading '#' for block comment

(E266)


[error] 698-698: too many leading '#' for block comment

(E266)


[error] 699-699: too many leading '#' for block comment

(E266)


[error] 712-712: too many leading '#' for block comment

(E266)


[error] 713-713: too many leading '#' for block comment

(E266)


[error] 714-714: too many leading '#' for block comment

(E266)


[error] 720-720: too many leading '#' for block comment

(E266)


[error] 721-721: too many leading '#' for block comment

(E266)


[error] 722-722: too many leading '#' for block comment

(E266)


[error] 724-724: too many leading '#' for block comment

(E266)


[error] 725-725: too many leading '#' for block comment

(E266)


[error] 726-726: too many leading '#' for block comment

(E266)


[error] 741-741: too many leading '#' for block comment

(E266)


[error] 743-743: local variable 'startinpos' is assigned to but never used

(F841)


[error] 761-761: local variable 'endinpos' is assigned to but never used

(F841)


[error] 792-792: too many leading '#' for block comment

(E266)


[error] 793-793: too many leading '#' for block comment

(E266)


[error] 803-803: too many leading '#' for block comment

(E266)


[error] 804-804: too many leading '#' for block comment

(E266)


[error] 805-805: too many leading '#' for block comment

(E266)


[error] 806-806: too many leading '#' for block comment

(E266)


[error] 927-927: do not compare types, for exact checks use is / is not, for instance checks use isinstance()

(E721)


[error] 934-934: local variable 'consumed' is assigned to but never used

(F841)


[error] 946-946: too many leading '#' for block comment

(E266)


[error] 947-947: too many leading '#' for block comment

(E266)


[error] 948-948: too many leading '#' for block comment

(E266)


[error] 949-949: too many leading '#' for block comment

(E266)


[error] 1000-1000: too many leading '#' for block comment

(E266)


[error] 1254-1254: 'lookup_error' may be undefined, or defined from star imports: _codecs

(F405)


[error] 1283-1283: too many leading '#' for block comment

(E266)


[error] 1284-1284: too many leading '#' for block comment

(E266)


[error] 1355-1355: whitespace before ':'

(E203)


[error] 1389-1389: too many leading '#' for block comment

(E266)


[error] 1394-1394: too many leading '#' for block comment

(E266)


[error] 1404-1404: too many leading '#' for block comment

(E266)


[error] 1440-1440: too many leading '#' for block comment

(E266)


[error] 1441-1441: too many leading '#' for block comment

(E266)


[error] 1464-1464: too many leading '#' for block comment

(E266)


[error] 1485-1485: whitespace before ':'

(E203)


[error] 1488-1488: local variable 'e' is assigned to but never used

(F841)


[error] 1553-1553: too many leading '#' for block comment

(E266)


[error] 1554-1554: too many leading '#' for block comment

(E266)


[error] 1555-1555: too many leading '#' for block comment

(E266)


[error] 1591-1591: too many leading '#' for block comment

(E266)


[error] 1592-1592: comparison to None should be 'if cond is None:'

(E711)


[error] 1640-1640: local variable 'startinpos' is assigned to but never used

(F841)


[error] 1641-1641: too many leading '#' for block comment

(E266)


[error] 1642-1642: too many leading '#' for block comment

(E266)


[error] 1673-1673: whitespace before ':'

(E203)

🪛 Pylint (3.3.7)
Lib/_pycodecs.py

[error] 99-99: function already defined line 96

(E0102)


[error] 112-112: function already defined line 96

(E0102)


[error] 121-121: function already defined line 96

(E0102)


[error] 132-132: function already defined line 96

(E0102)


[error] 139-139: function already defined line 96

(E0102)


[error] 146-146: function already defined line 96

(E0102)


[error] 153-153: function already defined line 96

(E0102)


[error] 160-160: function already defined line 96

(E0102)


[error] 172-172: function already defined line 96

(E0102)


[error] 179-179: function already defined line 96

(E0102)


[error] 186-186: function already defined line 96

(E0102)


[error] 194-194: function already defined line 96

(E0102)


[refactor] 206-221: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[error] 251-251: function already defined line 96

(E0102)


[error] 270-270: function already defined line 96

(E0102)


[refactor] 270-270: Too many branches (16/12)

(R0912)


[error] 318-318: function already defined line 96

(E0102)


[error] 325-325: function already defined line 96

(E0102)


[error] 344-344: function already defined line 96

(E0102)


[error] 351-351: function already defined line 96

(E0102)


[error] 358-358: function already defined line 96

(E0102)


[error] 365-365: function already defined line 96

(E0102)


[error] 372-372: function already defined line 96

(E0102)


[error] 379-379: function already defined line 96

(E0102)


[error] 391-391: function already defined line 96

(E0102)


[error] 404-404: function already defined line 96

(E0102)


[error] 411-411: function already defined line 96

(E0102)


[error] 418-418: function already defined line 96

(E0102)


[error] 425-425: function already defined line 96

(E0102)


[error] 437-437: function already defined line 96

(E0102)


[error] 449-449: function already defined line 96

(E0102)


[error] 461-461: function already defined line 96

(E0102)


[refactor] 648-657: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)


[refactor] 668-668: Too many local variables (20/15)

(R0914)


[refactor] 668-668: Too many branches (20/12)

(R0912)


[refactor] 668-668: Too many statements (63/50)

(R0915)


[refactor] 697-697: Simplify chained comparison between the operands

(R1716)


[refactor] 774-822: Too many nested blocks (6/5)

(R1702)


[refactor] 767-767: Too many branches (15/12)

(R0912)


[refactor] 845-869: Unnecessary "elif" after "continue", remove the leading "el" from "elif"

(R1724)


[refactor] 845-845: Consider merging these comparisons with 'in' by using 'ch in (p[1], '\')'. Use a set instead if elements are hashable.

(R1714)


[refactor] 834-834: Too many branches (16/12)

(R0912)


[refactor] 932-932: Too many local variables (17/15)

(R0914)


[refactor] 932-932: Too many branches (24/12)

(R0912)


[refactor] 932-932: Too many statements (74/50)

(R0915)


[refactor] 1018-1018: Simplify chained comparison between the operands

(R1716)


[refactor] 1021-1036: Unnecessary "else" after "continue", remove the "else" and de-indent the code inside it

(R1724)


[refactor] 1021-1021: Simplify chained comparison between the operands

(R1716)


[refactor] 1055-1058: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1122-1122: Too many local variables (16/15)

(R0914)


[refactor] 1122-1122: Too many branches (17/12)

(R0912)


[refactor] 1122-1122: Too many statements (51/50)

(R0915)


[refactor] 1201-1204: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1251-1251: Too many arguments (7/5)

(R0913)


[refactor] 1251-1251: Too many positional arguments (7/5)

(R0917)


[refactor] 1264-1275: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)


[refactor] 1377-1377: Too many local variables (17/15)

(R0914)


[refactor] 1390-1511: Unnecessary "else" after "continue", remove the "else" and de-indent the code inside it

(R1724)


[refactor] 1388-1511: Too many nested blocks (7/5)

(R1702)


[refactor] 1482-1482: Simplify chained comparison between the operands

(R1716)


[refactor] 1388-1511: Too many nested blocks (6/5)

(R1702)


[refactor] 1377-1377: Too many branches (36/12)

(R0912)


[refactor] 1377-1377: Too many statements (102/50)

(R0915)


[refactor] 1624-1624: Too many branches (17/12)

(R0912)


[refactor] 1624-1624: Too many statements (51/50)

(R0915)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with rustfmt and clippy
🔇 Additional comments (2)
Lib/_pycodecs.py (2)

1082-1117: Excellent surrogate handling implementation

The updated UTF-16 encoder now properly validates and handles surrogates, which is a significant improvement:

  • Detects and rejects lone surrogates in the input
  • Correctly encodes characters above BMP as surrogate pairs
  • Prevents surrogates in error handler replacement strings

This ensures compliance with the UTF-16 specification.


1221-1238: Correct UTF-32 surrogate validation

The UTF-32 encoder properly validates that surrogates (U+D800-U+DFFF) are not encoded, as they are invalid in UTF-32. This ensures compliance with the Unicode standard.

@youknowone youknowone mentioned this pull request Jul 15, 2025
@youknowone youknowone closed this Dec 30, 2025
@youknowone youknowone reopened this Feb 1, 2026
@youknowone youknowone force-pushed the utf32 branch 4 times, most recently from 44224d6 to e10cbe8 Compare February 2, 2026 01:14
- Add UTF-32, UTF-32-LE, UTF-32-BE encode/decode in _pycodecs.py
- Register utf_32 codec functions in codecs.rs via delegate_pycodecs
- Fix PyUnicode_EncodeUTF16 returning "" instead of [] for empty input
- Remove resolved expectedFailure decorators in test_codecs.py
- Add failure reasons to remaining expectedFailure comments
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/codecs.py
[ ] test: cpython/Lib/test/test_codecs.py (TODO: 80)
[ ] test: cpython/Lib/test/test_codeccallbacks.py (TODO: 9)
[ ] test: cpython/Lib/test/test_codecencodings_cn.py
[ ] test: cpython/Lib/test/test_codecencodings_hk.py
[ ] test: cpython/Lib/test/test_codecencodings_iso2022.py
[ ] test: cpython/Lib/test/test_codecencodings_jp.py
[ ] test: cpython/Lib/test/test_codecencodings_kr.py
[ ] test: cpython/Lib/test/test_codecencodings_tw.py
[ ] test: cpython/Lib/test/test_codecmaps_cn.py
[ ] test: cpython/Lib/test/test_codecmaps_hk.py
[ ] test: cpython/Lib/test/test_codecmaps_jp.py
[ ] test: cpython/Lib/test/test_codecmaps_kr.py
[ ] test: cpython/Lib/test/test_codecmaps_tw.py
[ ] test: cpython/Lib/test/test_charmapcodec.py
[ ] test: cpython/Lib/test/test_multibytecodec.py

dependencies:

  • codecs

dependent tests: (104 tests)

  • codecs: test_charmapcodec test_codeccallbacks test_codecs test_eof test_exceptions test_importlib test_io test_json test_locale test_logging test_os test_plistlib test_str test_sys
    • encodings:
      • locale: test__locale test_builtin test_c_locale_coercion test_calendar test_decimal test_format test_re test_regrtest test_time test_types test_utf8_mode
    • json: test_subprocess test_sysconfig test_tomllib test_tools test_traceback test_zoneinfo
      • importlib.metadata: test_importlib
    • pickle: test_array test_asyncio test_bytes test_bz2 test_collections test_concurrent_futures test_coroutines test_csv test_ctypes test_defaultdict test_deque test_descr test_dict test_dictviews test_email test_enum test_enumerate test_fractions test_functools test_generators test_genericalias test_http_cookies test_inspect test_ipaddress test_iter test_itertools test_list test_lzma test_memoryio test_memoryview test_opcache test_operator test_ordered_dict test_pathlib test_pickle test_pickletools test_platform test_positional_only_arg test_posix test_random test_range test_set test_shelve test_slice test_socket test_statistics test_string test_trace test_tuple test_typing test_unittest test_uuid test_xml_dom_minicompat test_xml_etree test_zipfile test_zlib test_zoneinfo
      • logging.handlers: test_concurrent_futures
    • tokenize: test_linecache test_tabnanny test_tokenize
      • inspect: test_abc test_argparse test_asyncgen test_code test_grammar test_ntpath test_patma test_posixpath test_signal test_yield_from test_zipimport

[x] lib: cpython/Lib/imaplib.py
[x] test: cpython/Lib/test/test_imaplib.py (TODO: 1)

dependencies:

  • imaplib

dependent tests: (1 tests)

  • imaplib: test_imaplib

[ ] lib: cpython/Lib/socket.py
[ ] test: cpython/Lib/test/test_socket.py (TODO: 12)

dependencies:

  • socket (native: _socket, sys)
    • io (native: _io, _thread, errno, sys)
    • os (native: os.path, sys)
    • enum

dependent tests: (53 tests)

  • socket: test_asyncio test_epoll test_exception_hierarchy test_ftplib test_httplib test_httpservers test_imaplib test_kqueue test_largefile test_logging test_mailbox test_mmap test_os test_pathlib test_pty test_selectors test_signal test_smtplib test_smtpnet test_socket test_socketserver test_ssl test_stat test_support test_sys test_timeout test_urllib test_urllib2 test_urllib2net test_urllib_response test_urllibnet test_xmlrpc
    • asyncio: test_asyncio test_builtin test_contextlib_async test_inspect test_sys_settrace test_unittest
    • http.client: test_docxmlrpc test_hashlib test_ucn test_unicodedata test_wsgiref
      • urllib.request: test_http_cookiejar test_site test_urllib2_localnet
    • http.server: test_robotparser
    • logging.handlers: test_concurrent_futures
    • mailbox: test_genericalias
    • multiprocessing: test_concurrent_futures test_fcntl test_multiprocessing_main_handling
      • concurrent.futures.process: test_concurrent_futures

[ ] test: cpython/Lib/test/test_array.py (TODO: 4)

dependencies:

dependent tests: (27 tests)

  • array: test_android test_array test_base64 test_binascii test_bytes test_bz2 test_collections test_ctypes test_file test_fileio test_genericalias test_gzip test_hashlib test_httplib test_io test_ioctl test_long test_lzma test_marshal test_memoryview test_patma test_socket test_ssl test_struct test_urllib2 test_zipfile test_zstd

[ ] test: cpython/Lib/test/test_bigmem.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on bigmem)

[ ] lib: cpython/Lib/io.py
[ ] lib: cpython/Lib/_pyio.py
[ ] test: cpython/Lib/test/test_io.py (TODO: 59)
[ ] test: cpython/Lib/test/test_bufio.py (TODO: 2)
[ ] test: cpython/Lib/test/test_fileio.py
[ ] test: cpython/Lib/test/test_memoryio.py (TODO: 26)

dependencies:

  • io

dependent tests: (87 tests)

  • io: test__colorize test_android test_argparse test_ast test_asyncio test_bufio test_builtin test_bz2 test_calendar test_cmd test_cmd_line_script test_codecs test_compileall test_compiler_assemble test_concurrent_futures test_configparser test_contextlib test_csv test_dbm_dumb test_dis test_email test_enum test_file test_fileinput test_fileio test_ftplib test_getpass test_gzip test_hashlib test_httplib test_httpservers test_importlib test_inspect test_io test_json test_largefile test_logging test_lzma test_mailbox test_marshal test_memoryio test_memoryview test_mimetypes test_optparse test_pathlib test_pickle test_pickletools test_plistlib test_pprint test_print test_pty test_pulldom test_pyexpat test_regrtest test_robotparser test_shlex test_shutil test_site test_smtplib test_socket test_socketserver test_subprocess test_support test_sys test_tarfile test_tempfile test_threadedtempfile test_timeit test_tokenize test_traceback test_typing test_unittest test_univnewlines test_urllib test_urllib2 test_uuid test_wave test_wsgiref test_xml_dom_xmlbuilder test_xml_etree test_xml_etree_c test_xmlrpc test_zipapp test_zipfile test_zipimport test_zoneinfo test_zstd

[ ] lib: cpython/Lib/json
[ ] test: cpython/Lib/test/test_json (TODO: 17)

dependencies:

  • json (native: json.tool, sys)
    • _colorize, argparse, codecs, re

dependent tests: (9 tests)

  • json: test_logging test_plistlib test_subprocess test_sysconfig test_tomllib test_tools test_traceback test_zoneinfo
    • importlib.metadata: test_importlib

[ ] lib: cpython/Lib/subprocess.py
[ ] test: cpython/Lib/test/test_subprocess.py (TODO: 6)

dependencies:

  • subprocess (native: builtins, errno, sys, time)
    • locale (native: builtins, encodings.aliases, sys)
    • io, os
    • contextlib, signal, threading, types, warnings

dependent tests: (53 tests)

  • subprocess: test_android test_asyncio test_audit test_bz2 test_c_locale_coercion test_cmd_line test_cmd_line_script test_ctypes test_dtrace test_faulthandler test_file_eintr test_gzip test_inspect test_json test_msvcrt test_ntpath test_os test_osx_env test_platform test_plistlib test_poll test_py_compile test_regrtest test_repl test_runpy test_script_helper test_select test_shutil test_signal test_site test_sqlite3 test_subprocess test_support test_sys test_sysconfig test_tempfile test_threading test_unittest test_urllib2 test_utf8_mode test_venv test_wait3 test_webbrowser test_zipfile
    • asyncio: test_asyncio test_builtin test_contextlib_async test_logging test_sys_settrace test_unittest
    • ctypes.util: test_ctypes
    • ensurepip: test_ensurepip
    • multiprocessing.util: test_concurrent_futures

[ ] lib: cpython/Lib/xml
[ ] test: cpython/Lib/test/test_xml_etree.py (TODO: 55)
[ ] test: cpython/Lib/test/test_xml_etree_c.py
[ ] test: cpython/Lib/test/test_minidom.py
[ ] test: cpython/Lib/test/test_pulldom.py (TODO: 4)
[ ] test: cpython/Lib/test/test_pyexpat.py (TODO: 28)
[ ] test: cpython/Lib/test/test_sax.py
[x] test: cpython/Lib/test/test_xml_dom_minicompat.py
[x] test: cpython/Lib/test/test_xml_dom_xmlbuilder.py

dependencies:

  • xml (native: collections.abc, pyexpat, sys, urllib.parse, xml.dom, xml.dom.NodeFilter, xml.dom.minicompat, xml.dom.minidom, xml.dom.xmlbuilder, xml.etree.ElementTree, xml.parsers, xml.sax, xml.sax._exceptions, xml.sax.handler)
    • collections (native: _weakref, itertools, sys)
    • copy
    • io, os
    • codecs, contextlib, re, warnings, weakref

dependent tests: (7 tests)

  • xml: test_pulldom test_pyexpat test_regrtest test_xml_dom_minicompat test_xml_dom_xmlbuilder
    • plistlib: test_plistlib
    • xmlrpc.client: test_xmlrpc

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone merged commit 100b870 into RustPython:main Feb 2, 2026
14 checks passed
@youknowone youknowone deleted the utf32 branch February 2, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant