Use num_enum crate for oparg types#6980
Conversation
📝 WalkthroughWalkthroughReplaced bespoke op-arg conversion helpers with Into/TryFrom (num_enum-derived) and added workspace dependency Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 288-303: The TryFrom<u32> impl for MakeFunctionFlags currently
uses a silent truncating cast (value as u8); change MakeFunctionFlags::try_from
to perform a checked conversion (use u8::try_from(value) and map any Err to
MarshalError::InvalidBytecode) and then call Self::from_bits on the resulting
u8, returning the same InvalidBytecode error if from_bits returns None; update
references in the impl TryFrom<u32> for MakeFunctionFlags to remove the direct
cast and use the checked conversion pattern used elsewhere (e.g., instruction
parsing) so high-bit values are rejected instead of truncated.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode/instruction.rs (1)
1271-1276: Update the safety note to match the current conversion.The safety contract still references the old conversion name; adjusting the wording avoids confusion for future readers.
📝 Suggested doc tweak
- /// T::from_op_arg(self) must succeed + /// The conversion from `OpArg` to `T` must succeed.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 567-580: Replace the infallible From<u32> for UnpackExArgs with a
checked TryFrom<u32> implementation that validates the top two bytes are zero
and returns an appropriate error on non-zero high bits; keep the existing
From<UnpackExArgs> for u32 as-is and ensure UnpackExArgs still implements
OpArgType (update any call sites to use TryFrom::try_from where needed).
Specifically, implement TryFrom<u32> for UnpackExArgs by calling
value.to_le_bytes(), verifying that bytes[2] and bytes[3] are zero, and
constructing Self { before: bytes[0], after: bytes[1] } on success, otherwise
return an error consistent with the crate's error type pattern (matching how
MakeFunctionFlags handles malformed input).
| impl From<u32> for UnpackExArgs { | ||
| fn from(value: u32) -> Self { | ||
| let [before, after, ..] = value.to_le_bytes(); | ||
| Self { before, after } | ||
| } | ||
| } | ||
|
|
||
| #[inline(always)] | ||
| fn to_op_arg(self) -> u32 { | ||
| u32::from_le_bytes([self.before, self.after, 0, 0]) | ||
| impl From<UnpackExArgs> for u32 { | ||
| fn from(value: UnpackExArgs) -> Self { | ||
| Self::from_le_bytes([value.before, value.after, 0, 0]) | ||
| } | ||
| } | ||
|
|
||
| impl OpArgType for UnpackExArgs {} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the file structure and understand UnpackExArgs
cd crates/compiler-core && head -600 src/bytecode/oparg.rs | tail -100Repository: RustPython/RustPython
Length of output: 2741
🏁 Script executed:
# Also check the full definition of UnpackExArgs
cd crates/compiler-core && rg -B5 -A5 "struct UnpackExArgs"Repository: RustPython/RustPython
Length of output: 476
🏁 Script executed:
# Search for MarshalError definition
cd crates/compiler-core && rg "enum MarshalError|struct MarshalError" -A10Repository: RustPython/RustPython
Length of output: 477
🏁 Script executed:
# Look for all usages of UnpackExArgs with From/Into
cd crates/compiler-core && rg "UnpackExArgs" -B2 -A2Repository: RustPython/RustPython
Length of output: 2134
🏁 Script executed:
# Find where UnpackExArgs is constructed or converted from u32
cd crates/compiler-core && rg "UnpackExArgs.*from|::from.*UnpackExArgs" -B3 -A3Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Search for bytecode reading/marshalling code that might use this conversion
cd crates/compiler-core && rg "oparg|OpArg" src/bytecode/instruction.rs | head -30Repository: RustPython/RustPython
Length of output: 1763
🏁 Script executed:
# Check if MarshalError is imported in oparg.rs
cd crates/compiler-core && head -50 src/bytecode/oparg.rsRepository: RustPython/RustPython
Length of output: 1221
🏁 Script executed:
# Look for Arg<UnpackExArgs> usage to understand the data flow
cd crates/compiler-core && rg "Arg<UnpackExArgs>" -B5 -A5Repository: RustPython/RustPython
Length of output: 592
🏁 Script executed:
# Check if there's a blanket impl or if TryFrom<u32> is actually required
cd crates/compiler-core && rg "impl.*OpArgType" -B2 -A2Repository: RustPython/RustPython
Length of output: 3737
🏁 Script executed:
# Search for actual usage of From for UnpackExArgs
cd crates/compiler-core && rg "\.into\(\)|From::<UnpackExArgs>" -B3 -A3Repository: RustPython/RustPython
Length of output: 3571
🏁 Script executed:
# Check how Arg<T> handles conversion and if it uses TryFrom
cd crates/compiler-core && rg "struct Arg|impl.*Arg" src/bytecode/instruction.rs | head -20Repository: RustPython/RustPython
Length of output: 259
🏁 Script executed:
# Look for where Arg values are decoded from bytecode
cd crates/compiler-core && rg "decode|from_bytes|read" src/ -l | grep -E "(decode|marshal|bytecode)" | head -5Repository: RustPython/RustPython
Length of output: 145
🏁 Script executed:
# Let's check if this code actually compiles by looking for any compiler errors or warnings
cd crates/compiler-core && cargo check --lib 2>&1 | head -50Repository: RustPython/RustPython
Length of output: 151
🏁 Script executed:
# Look at the trait definition more carefully - specifically the TryFrom requirement
cd crates/compiler-core && rg -A5 "pub trait OpArgType"Repository: RustPython/RustPython
Length of output: 407
🏁 Script executed:
# Check if there's a blanket impl of TryFrom<u32> for types with From<u32>
cd crates/compiler-core && rg "impl.*TryFrom" src/bytecode/ -B2 -A3 | head -40Repository: RustPython/RustPython
Length of output: 1455
🏁 Script executed:
# Search for where bytecode is actually deserialized and how opargs are read
cd crates/compiler-core && rg "read|decode" src/bytecode.rs -B2 -A5 | head -50Repository: RustPython/RustPython
Length of output: 1959
🏁 Script executed:
# Check if there's any blanket impl converting From to TryFrom
cd crates/compiler-core && rg "impl.*From.*TryFrom|blanket" src/ -B2 -A5Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Let's look at what MakeFunctionFlags does - it has explicit TryFrom
cd crates/compiler-core && rg -B10 "impl TryFrom<u32> for MakeFunctionFlags" src/bytecode/oparg.rs | head -30Repository: RustPython/RustPython
Length of output: 399
🏁 Script executed:
# Check the full context of UnpackExArgs - does it have a TryFrom impl elsewhere?
cd crates/compiler-core && rg -B20 -A20 "struct UnpackExArgs"Repository: RustPython/RustPython
Length of output: 1872
🏁 Script executed:
# Check if the code actually compiles by looking for any build artifacts or recent commits
cd crates/compiler-core && ls -la target/debug/*.rlib 2>/dev/null | head -5 || echo "No build artifacts found"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Let's check if Rust's blanket impl from From to TryFrom exists and understand the exact behavior
cd crates/compiler-core && python3 << 'EOF'
# Check what TryFrom actually requires
print("Checking Rust's blanket impl...")
print("In Rust, when you impl From<T> for U, TryFrom<T> for U is automatically available")
print("because there's a blanket impl: impl<F, T> TryFrom<F> for T where T: From<F>")
print("This blanket impl calls from() and wraps it in Ok()")
print("\nSo UnpackExArgs satisfies OpArgType trait requirements via blanket impl")
print("BUT the issue is: silent truncation of high bytes is not validated")
EOFRepository: RustPython/RustPython
Length of output: 447
🏁 Script executed:
# Let's verify this is actually compiled and working by checking the actual module exports
cd crates/compiler-core && rg "pub use.*UnpackExArgs" src/ -B2 -A2Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check if there are any runtime validation of opargs elsewhere
cd crates/compiler-core && rg "UnpackEx|oparg.*valid|validate" src/ -B2 -A2 | head -40Repository: RustPython/RustPython
Length of output: 2036
🏁 Script executed:
# Let's look at a real bytecode reader to understand the data flow
cd crates/compiler-core && rg -l "read.*oparg|parse.*instruction" src/ | head -3 | xargs cat 2>/dev/null | head -100Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check how opcodes are actually written to bytecode - are high bits ever legitimately set?
cd crates/compiler-core && rg "encode.*oparg|write.*oparg" src/ -B3 -A3 | head -50Repository: RustPython/RustPython
Length of output: 728
🏁 Script executed:
# Look at the UNPACK_EX opcode in Python to understand what values it should accept
cd crates/compiler-core && rg -i "unpack.ex|118" src/bytecode/ -B2 -A2 | head -30Repository: RustPython/RustPython
Length of output: 739
🏁 Script executed:
# Check if there's any documentation about the bytecode format
cd crates/compiler-core && find . -name "*.md" -o -name "*.txt" | xargs grep -l "oparg\|bytecode" 2>/dev/null | head -3Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Verify that u16::try_from pattern would work and what error it produces
python3 << 'EOF'
print("Checking the proposed fix logic:")
print("1. If value is > 0xFFFF (u16::MAX), try_from fails")
print("2. This would correctly reject malformed bytecode")
print("3. Lower 16 bits are extracted as u16, then converted to [u8; 2]")
print("\nExample:")
print(" value = 0x1234_5678 -> u16::try_from fails (good!)")
print(" value = 0x0000_1234 -> u16 = 0x1234 -> bytes = [0x34, 0x12]")
print("\nThe review comment's proposed fix is sound")
EOFRepository: RustPython/RustPython
Length of output: 432
🏁 Script executed:
# Let's verify no direct usage of From::from() for UnpackExArgs that would break
cd crates/compiler-core && rg "From::<UnpackExArgs>|from\(\)" src/ -B2 -A2 | head -30Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check how the bytecode marshalling actually works - does it use the trait generically?
cd crates/compiler-core && rg "decode.*instruction|read.*oparg" src/bytecode.rs -B5 -A10 | head -60Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Let's look at the actual bytecode format to understand if high bits should ever be non-zero
python3 << 'EOF'
print("Understanding UNPACK_EX bytecode format:")
print("- Python bytecode uses variable-length encoding with EXTENDED_ARG")
print("- UNPACK_EX stores 'before' and 'after' counts in lower 16 bits")
print("- Each count is an 8-bit value (0-255)")
print("- So the valid range is 0x0000-0xFFFF, where bytes map to [before, after]")
print("\nIf high 16 bits (bytes 2-3) are non-zero, it's malformed bytecode")
print("The proposed u16::try_from would correctly reject values > 0xFFFF")
EOFRepository: RustPython/RustPython
Length of output: 481
Validate high bits when decoding UnpackExArgs.
From<u32> silently drops the upper two bytes. Since UnpackExArgs only needs 16 bits (two u8 fields), adopt a checked TryFrom<u32> implementation to reject malformed bytecode instead of truncating silently.
Proposed fix
-impl From<u32> for UnpackExArgs {
- fn from(value: u32) -> Self {
- let [before, after, ..] = value.to_le_bytes();
- Self { before, after }
- }
-}
+impl TryFrom<u32> for UnpackExArgs {
+ type Error = MarshalError;
+ fn try_from(value: u32) -> Result<Self, Self::Error> {
+ let value = u16::try_from(value).map_err(|_| MarshalError::InvalidBytecode)?;
+ let [before, after] = value.to_le_bytes();
+ Ok(Self { before, after })
+ }
+}Aligns with Rust best practices for error handling and the existing pattern in this codebase (e.g., MakeFunctionFlags).
🤖 Prompt for AI Agents
In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 567 - 580, Replace
the infallible From<u32> for UnpackExArgs with a checked TryFrom<u32>
implementation that validates the top two bytes are zero and returns an
appropriate error on non-zero high bits; keep the existing From<UnpackExArgs>
for u32 as-is and ensure UnpackExArgs still implements OpArgType (update any
call sites to use TryFrom::try_from where needed). Specifically, implement
TryFrom<u32> for UnpackExArgs by calling value.to_le_bytes(), verifying that
bytes[2] and bytes[3] are zero, and constructing Self { before: bytes[0], after:
bytes[1] } on success, otherwise return an error consistent with the crate's
error type pattern (matching how MakeFunctionFlags handles malformed input).
youknowone
left a comment
There was a problem hiding this comment.
Sorry! forgot to finish the comment. I had to investigte the background a little bit.
| /// f"{x}" | ||
| /// f"{x:4}" | ||
| /// ``` | ||
| // Ruff `ConversionFlag::None` is `-1i8`, when its converted to `u8` its value is `u8::MAX`. |
There was a problem hiding this comment.
I encountered similar issue a few days ago. There was a error due to 255 != -1.
There was a problem hiding this comment.
We should not do the conversion of ruff_value as u8 but to match Ruff's enum and choose the correct value accordingly
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq)] | ||
| #[derive(Clone, Copy, Debug, Eq, Hash, IntoPrimitive, PartialEq, TryFromPrimitive)] | ||
| #[num_enum(error_type(name = MarshalError, constructor = new_invalid_bytecode))] | ||
| #[repr(u32)] |
There was a problem hiding this comment.
I know OpArg is up to u32, so it must be represented u32 for maximum extending.
But at the same time, each OpArg byte requires ExtendedArg instruction. It means if we use wrong-sized OpArg with wrong count of ExtendedArg, it will lead bug.
When we type this as repr(u8), this type is guaranteed to require no ExtendArg. But if repr(u32), we have to check the validity with another method. So this is looking bad idea to me.
I don't know much about OpArg, please correct me if I am missing something.
Oh, I also think this need to be typed i8 if possible.
There was a problem hiding this comment.
I know OpArg is up to u32, so it must be represented u32 for maximum extending. But at the same time, each OpArg byte requires ExtendedArg instruction. It means if we use wrong-sized OpArg with wrong count of ExtendedArg, it will lead bug.
That's correct. I believe the CI will catch this immediately, so we might not need to worry about this too much.
When we type this as repr(u8), this type is guaranteed to require no ExtendArg. But if repr(u32), we have to check the validity with another method. So this is looking bad idea to me.
The reason for why I changed it to #[repr(u32)] is because num_enum implements the From<u32> which OpArgType trait needs, I can revert it to #[repr(u8)] and implement From<u32> manually for each oparg enum.
I don't know much about OpArg, please correct me if I am missing something.
Oh, I also think this need to be typed i8 if possible.
Cpython have a mix for this specific oparg, in the AST they represent it as b'a' / b'r' / b's' / -1 but in the oparg it can be in 0..=2, because ruff only cares about the AST they have it as an i8, we should keep it as u8
There was a problem hiding this comment.
you seem to made it working with u8, great 👍
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/compiler-core/src/bytecode/oparg.rs`:
- Around line 186-195: ResumeType is missing the standard enum boilerplate used
elsewhere: add the impl_oparg_enum_traits! macro invocation for ResumeType and
provide the OpArgType impl for ResumeType (matching the pattern used by
RaiseKind, IntrinsicFunction1, IntrinsicFunction2, etc.) so it can be treated as
an instruction argument; locate the ResumeType enum and insert
impl_oparg_enum_traits!(ResumeType) and the corresponding impl OpArgType for
ResumeType following the same structure as the other enums in this file.
| /// Resume type for the RESUME instruction | ||
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq)] | ||
| #[repr(u32)] | ||
| #[repr(u8)] | ||
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, IntoPrimitive, TryFromPrimitive)] | ||
| #[num_enum(error_type(name = MarshalError, constructor = new_invalid_bytecode))] | ||
| pub enum ResumeType { | ||
| AtFuncStart = 0, | ||
| AfterYield = 1, | ||
| AfterYieldFrom = 2, | ||
| AfterAwait = 3, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for Arg<ResumeType> usage
rg "Arg<ResumeType>|Arg::<ResumeType>" crates/Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Also search more broadly for ResumeType usage patterns
rg "ResumeType" crates/ -B 2 -A 2 | head -100Repository: RustPython/RustPython
Length of output: 2815
🏁 Script executed:
# Check the structure of the oparg.rs file to understand the pattern
cat -n crates/compiler-core/src/bytecode/oparg.rs | grep -A 10 "impl_oparg_enum_traits"Repository: RustPython/RustPython
Length of output: 5025
🏁 Script executed:
# Look at RaiseKind and IntrinsicFunction1 to compare patterns
cat -n crates/compiler-core/src/bytecode/oparg.rs | sed -n '1,300p' | grep -A 20 "enum RaiseKind\|enum IntrinsicFunction1"Repository: RustPython/RustPython
Length of output: 1631
🏁 Script executed:
# Check the actual ResumeType definition in oparg.rs
cat -n crates/compiler-core/src/bytecode/oparg.rs | sed -n '186,220p'Repository: RustPython/RustPython
Length of output: 1096
🏁 Script executed:
# Search for impl_oparg_enum_traits or impl OpArgType for ResumeType
rg "impl_oparg_enum_traits.*ResumeType|impl OpArgType for ResumeType" crates/Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Check if there are any compilation errors or warnings related to ResumeType
# by looking at Cargo build output or checking the structure more carefully
cd crates/compiler-core && cargo clippy --lib 2>&1 | grep -i "resumetype\|resume_type" | head -20Repository: RustPython/RustPython
Length of output: 151
ResumeType should follow the same pattern as other similar enums for consistency.
ResumeType has the num_enum derives but lacks impl_oparg_enum_traits! and impl OpArgType, unlike every other enum with the same structure in this file (RaiseKind, IntrinsicFunction1, IntrinsicFunction2, etc.). While the current code works because ResumeType is manually cast to u32, adding these implementations ensures consistency with the codebase pattern and enables ResumeType to be used as an instruction argument type if needed in the future.
Proposed fix
pub enum ResumeType {
AtFuncStart = 0,
AfterYield = 1,
AfterYieldFrom = 2,
AfterAwait = 3,
}
+
+impl_oparg_enum_traits!(ResumeType);
+impl OpArgType for ResumeType {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Resume type for the RESUME instruction | |
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq)] | |
| #[repr(u32)] | |
| #[repr(u8)] | |
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, IntoPrimitive, TryFromPrimitive)] | |
| #[num_enum(error_type(name = MarshalError, constructor = new_invalid_bytecode))] | |
| pub enum ResumeType { | |
| AtFuncStart = 0, | |
| AfterYield = 1, | |
| AfterYieldFrom = 2, | |
| AfterAwait = 3, | |
| } | |
| /// Resume type for the RESUME instruction | |
| #[repr(u8)] | |
| #[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, IntoPrimitive, TryFromPrimitive)] | |
| #[num_enum(error_type(name = MarshalError, constructor = new_invalid_bytecode))] | |
| pub enum ResumeType { | |
| AtFuncStart = 0, | |
| AfterYield = 1, | |
| AfterYieldFrom = 2, | |
| AfterAwait = 3, | |
| } | |
| impl_oparg_enum_traits!(ResumeType); | |
| impl OpArgType for ResumeType {} |
🤖 Prompt for AI Agents
In `@crates/compiler-core/src/bytecode/oparg.rs` around lines 186 - 195,
ResumeType is missing the standard enum boilerplate used elsewhere: add the
impl_oparg_enum_traits! macro invocation for ResumeType and provide the
OpArgType impl for ResumeType (matching the pattern used by RaiseKind,
IntrinsicFunction1, IntrinsicFunction2, etc.) so it can be treated as an
instruction argument; locate the ResumeType enum and insert
impl_oparg_enum_traits!(ResumeType) and the corresponding impl OpArgType for
ResumeType following the same structure as the other enums in this file.
| /// x[5:10] | ||
| /// ``` | ||
| Two, | ||
| Two = 2, |
There was a problem hiding this comment.
I thought we probably switched this un-numbered enum for more room to optimization. Does this need to be numbered again?
There was a problem hiding this comment.
Yes, that's because of the use of num_enum, I can revert this change if you'd like. Also, why don't we do it for all other opargs?
There was a problem hiding this comment.
That's fair. I thought about this again. We can define the serialize/deserialize values separate from this values' internal memory representation, right? Then removing values from other types and only defining them for serialize/deserialize make sense.
Summary by CodeRabbit
Chores
New Features