Update test_fstring from v3.14.3 and impl more#7164
Update test_fstring from v3.14.3 and impl more#7164youknowone merged 3 commits intoRustPython:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds f-string debug-comment sanitization (duplicate helper present), exposes an Unknown format code, inserts an AST-based string-escape warning pass invoked post-compile when parser is enabled, and expands parser→VM error message mappings for string- and f-string-related parse errors. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant VM as VirtualMachine
participant Compiler as Compiler/Codegen
participant Parser as Parser/AST
participant Analyzer as EscapeWarnings
participant Emitter as WarningEmitter
Caller->>VM: compile_with_opts(source, opts)
VM->>Compiler: compile source
Compiler-->>VM: compilation result (Ok / Err)
alt parser feature enabled and compilation succeeded
VM->>Parser: parse source -> AST
Parser-->>VM: AST
VM->>Analyzer: traverse AST (strings, bytes, f-strings)
Analyzer->>Analyzer: detect invalid escapes & locations
Analyzer->>Emitter: create SyntaxWarning(filename, line, message)
Emitter-->>VM: emit warning(s)
end
VM-->>Caller: return compiled code or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin fstring |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@crates/vm/src/vm/compile.rs`:
- Around line 296-315: The loop over fstring elements fails to visit
interpolation expressions inside a format spec, so nested interpolations in
interp.format_spec are never processed; update the block handling
interp.format_spec in the visit of ast::InterpolatedStringElement::Interpolation
to iterate spec.elements and for each element match
ast::InterpolatedStringElement::Interpolation and call
self.visit_expr(&inner.expression) (and continue to call
self.check_fstring_literal(lit.range) for Literal cases) so both literals and
nested interpolation expressions inside format specs are handled; adjust the
handling near fstring.elements, interp.format_spec, spec.elements,
check_fstring_literal, and visit_expr accordingly.
- Around line 236-248: The check that emits warn_invalid_escape_sequence
incorrectly flags sequences like f"\\{x}" because it only tests
self.source[start..end].ends_with('\\') and the following char, not whether that
trailing backslash is itself escaped; update the logic around the block that
uses self.source[start..end] (the code that checks ends_with('\\') and then
reads after at end) to count consecutive trailing backslashes in that slice and
only call warn_invalid_escape_sequence when the count is odd (i.e., the final
backslash is unescaped), keeping references to self.source, start, end, and
warn_invalid_escape_sequence to locate and change the code.
- Around line 89-166: Add an is_bytes: bool parameter to first_invalid_escape
and use it to treat '\u', '\U', and '\N' escapes as invalid when is_bytes is
true (i.e., only allow those when is_bytes is false); update the match arms for
'u', 'U', and 'N' to check is_bytes and return false (invalid) for bytes
literals while keeping existing scanning behavior for non-bytes; then update all
callers to pass the correct flag (pass true from BytesLiteral sites and false
from StringLiteral and FString sites) so bytes literal validation enforces the
restriction.
🧹 Nitpick comments (1)
crates/vm/src/vm/compile.rs (1)
28-35: Source is parsed twice — once by the compiler and again for escape warnings.
compiler::compileparses the source internally, and thenemit_string_escape_warningsre-parses withruff_python_parser::parse. For large files, this doubles the parsing cost. Consider passing the AST from the compiler to avoid the redundant parse, or deferring warnings to the compilation pass itself.
There was a problem hiding this comment.
🧹 Nitpick comments (4)
crates/vm/src/vm/vm_new.rs (2)
568-582: Consider mergingInvalidStarredExpressionUsageandInvalidDeleteTargetarms.Both set
msg = "invalid syntax".to_owned(), differing only innarrow_caret. An or-pattern would reduce duplication.Suggested diff
#[cfg(feature = "parser")] crate::compiler::CompileError::Parse(rustpython_compiler::ParseError { - error: ruff_python_parser::ParseErrorType::InvalidStarredExpressionUsage, - .. - }) => { - msg = "invalid syntax".to_owned(); - narrow_caret = true; - } - #[cfg(feature = "parser")] - crate::compiler::CompileError::Parse(rustpython_compiler::ParseError { - error: ruff_python_parser::ParseErrorType::InvalidDeleteTarget, + error: + ruff_python_parser::ParseErrorType::InvalidStarredExpressionUsage + | ruff_python_parser::ParseErrorType::InvalidDeleteTarget, .. }) => { msg = "invalid syntax".to_owned(); + narrow_caret = matches!( + error, + crate::compiler::CompileError::Parse(rustpython_compiler::ParseError { + error: ruff_python_parser::ParseErrorType::InvalidStarredExpressionUsage, + .. + }) + ); }As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
603-616: String-comparison guards onOtherErrorare fragile.These arms rely on exact/prefix string matching against parser error messages (
"bytes literal cannot be mixed...","Expected an identifier, but found a keyword"). If the upstream parser (ruff) changes its wording, these will silently stop matching and fall through to the default_ => {}arm, producing un-normalized messages.Not blocking, but worth noting this coupling. Consider adding a comment that these strings are tied to ruff's error messages, or test them in CI against the expected parser version.
crates/vm/src/vm/compile.rs (2)
28-36: Double-parse of source on every successful compilation.
emit_string_escape_warningsre-parses the entire source (line 334) even thoughcompiler::compilealready parsed it. For large modules this effectively doubles parse cost on the happy path.If the compiler pipeline can be refactored to expose the AST (or accept a post-parse callback), that would avoid the redundant work. Acceptable as a stopgap, but worth a TODO.
329-355:emit_string_escape_warningssilently swallows parse failures.Line 333–337: if
ruff_python_parser::parsefails, the function returns silently. This is fine in normal operation (the compile step already succeeded, so parsing should succeed), but it could mask bugs if the two parsers diverge. Adebug_assert!or trace-level log on theErrpath would aid debugging.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/vm/src/vm/compile.rs`:
- Around line 301-312: The loop over spec.elements only handles
ast::InterpolatedStringElement::Literal and skips Interpolation nodes, so nested
interpolation expressions and any string literals inside them (e.g.,
f"{x:{some_str}}") are not visited or checked; update the loop in compile.rs to
match both ast::InterpolatedStringElement::Literal and
ast::InterpolatedStringElement::Interpolation: for Literal call
self.check_fstring_literal(lit.range) as now, and for Interpolation call
self.visit_expr(&interp.expression) and then (if the interpolation has its own
format_spec) iterate its spec.elements recursively applying the same checks so
nested literals get check_fstring_literal and nested expressions get visit_expr;
use the existing methods visit_expr and check_fstring_literal and handle
ast::InterpolatedStringElement variants accordingly.
🧹 Nitpick comments (2)
crates/vm/src/vm/compile.rs (2)
28-35: Double-parsing of source for escape warnings.
emit_string_escape_warningsre-parses the entire source withruff_python_parser::parse(line 330), even though the compiler just parsed it. Consider threading the AST (or at least the string-literal ranges) from the compilation step to avoid parsing the source twice. This matters for large files.
118-147: Duplicated hex-digit consumption for\x,\u,\U.The three arms (lines 118-127, 128-137, 138-147) share identical logic differing only in the digit count (2, 4, 8). Extract a small helper or collapse them. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Proposed refactor
+ 'x' | 'u' | 'U' => { + let count = match next { + 'x' => 2, + 'u' => 4, + 'U' => 8, + _ => unreachable!(), + }; + for _ in 0..count { + if chars.peek().is_some_and(|&(_, c)| c.is_ascii_hexdigit()) { + chars.next(); + } else { + break; + } + } + true + } - 'x' => { - for _ in 0..2 { - ... - } - true - } - 'u' => { - for _ in 0..4 { - ... - } - true - } - 'U' => { - for _ in 0..8 { - ... - } - true - }
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
crates/vm/src/vm/compile.rs (2)
28-35: Double-parsing on every successful compile is a performance concern.
emit_string_escape_warningsre-parses the entire source withruff_python_parser::parseafter the compiler has already parsed and compiled it. This doubles the parsing cost for every successfulcompile_with_optscall.Consider threading the already-parsed AST (or escape-warning data collected during the initial parse) to avoid the redundant parse. If that's too invasive for now, at minimum this should be documented as a known inefficiency with a TODO.
338-360: HardcodedMode::Modulemay miss warnings foreval/singlemode sources.The compile mode (
compiler::Mode) is not forwarded to the warning pass. If the original compilation usedevalorsinglemode, re-parsing asModulecould fail for edge-case inputs, causing escape warnings to be silently skipped (Line 341–342). Practically unlikely for string-literal-heavy code, but worth noting.Suggested fix
- pub(super) fn emit_string_escape_warnings(&self, source: &str, filename: &str) { - let Ok(parsed) = - ruff_python_parser::parse(source, ruff_python_parser::Mode::Module.into()) + pub(super) fn emit_string_escape_warnings(&self, source: &str, filename: &str, mode: ruff_python_parser::Mode) { + let Ok(parsed) = + ruff_python_parser::parse(source, mode.into())And at the call site (Line 32):
- self.emit_string_escape_warnings(source, &source_path); + self.emit_string_escape_warnings(source, &source_path, /* map compiler::Mode → ruff Mode */);crates/common/src/format.rs (1)
427-441:get_separator_intervalpanics onUnknown— currently unreachable but fragile.The wildcard
_ => panic!(...)at Line 439 now implicitly coversFormatType::Unknown(_). Today it's safe because every publicformat_*method rejectsUnknownbefore callingadd_magnitude_separators. However, this is a latent runtime panic if a new formatting path is added without that guard.Consider adding
Unknownexplicitly to the match or tovalidate_format's rejection list as a defensive measure.Suggested fix
const fn get_separator_interval(&self) -> usize { match self.format_type { Some(FormatType::Binary | FormatType::Octal | FormatType::Hex(_)) => 4, Some( FormatType::Decimal | FormatType::FixedPoint(_) | FormatType::GeneralFormat(_) | FormatType::Exponent(_) | FormatType::Percentage | FormatType::Number(_), ) => 3, None => 3, - _ => panic!("Separators only valid for numbers!"), + Some( + FormatType::String + | FormatType::Character + | FormatType::Unknown(_), + ) => panic!("Separators only valid for numbers!"), } }crates/vm/src/vm/vm_new.rs (2)
576-582:InvalidDeleteTargetcan be merged with the existing "invalid syntax" arm.This arm sets
msg = "invalid syntax"withoutnarrow_caret, identical to the block at Lines 554–567. Merge it into that existing arm to reduce duplication. As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".Suggested merge
#[cfg(feature = "parser")] crate::compiler::CompileError::Parse(rustpython_compiler::ParseError { error: ruff_python_parser::ParseErrorType::Lexical( ruff_python_parser::LexicalErrorType::UnrecognizedToken { .. }, ) | ruff_python_parser::ParseErrorType::SimpleStatementsOnSameLine | ruff_python_parser::ParseErrorType::SimpleAndCompoundStatementOnSameLine | ruff_python_parser::ParseErrorType::ExpectedToken { .. } - | ruff_python_parser::ParseErrorType::ExpectedExpression, + | ruff_python_parser::ParseErrorType::ExpectedExpression + | ruff_python_parser::ParseErrorType::InvalidDeleteTarget, .. }) => { msg = "invalid syntax".to_owned(); } - #[cfg(feature = "parser")] - crate::compiler::CompileError::Parse(rustpython_compiler::ParseError { - error: ruff_python_parser::ParseErrorType::InvalidDeleteTarget, - .. - }) => { - msg = "invalid syntax".to_owned(); - }
603-616: OtherError message normalizations are correct but inherently fragile.String-matching on parser error messages (
eq_ignore_ascii_case,starts_with) will silently break if the upstream ruff parser changes its error text. This is a known pattern in this file, so no action needed now, but consider adding a comment noting the dependency on specific ruff error message strings.
- Add Unknown(char) variant to FormatType for proper error messages on unrecognized format codes (test_errors) - Strip comments from f-string debug text in compile.rs (test_debug_conversion) - Map ruff SyntaxError messages to match CPython in vm_new.rs: InvalidDeleteTarget, LineContinuationError, UnclosedStringError, OtherError(bytes mixing), OtherError(keyword identifier), FStringError(UnterminatedString/UnterminatedTripleQuotedString), and backtick-to-quote replacement for FStringError messages
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] test: cpython/Lib/test/test_str.py (TODO: 16) dependencies: dependent tests: (no tests depend on str) Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/vm/src/vm/vm_new.rs (1)
603-608:eq_ignore_ascii_caseis unnecessarily broad for a known error string.The ruff parser emits this error with a fixed casing. A plain
==comparison would be more precise and slightly cheaper. Not a bug, buteq_ignore_ascii_casecould mask unexpected variations if the upstream string changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/vm_new.rs` around lines 603 - 608, The match arm handling crate::compiler::CompileError::Parse that destructures ruff_python_parser::ParseErrorType::OtherError(s) uses s.eq_ignore_ascii_case("bytes literal cannot be mixed with non-bytes literals"); change this to a strict equality comparison (s == "bytes literal cannot be mixed with non-bytes literals") to avoid masking casing changes and marginally improve performance—update the match condition in vm_new.rs accordingly.crates/vm/src/vm/compile.rs (1)
337-342: Source is re-parsed a second time just for escape warnings.The entire source is parsed again here after a successful compile. For large files this doubles the parsing cost. Consider caching or sharing the AST from the initial compile pass in a future iteration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/compile.rs` around lines 337 - 342, The emit_string_escape_warnings function currently reparses source via ruff_python_parser::parse, doubling work; modify emit_string_escape_warnings to accept an AST (or a reference to the parsed Module) produced by the initial compile pass (or take an Option<&Module> and fall back to parsing only if None), and update callers in the compile pipeline to pass the already-parsed AST from the initial parse step so the second parse is eliminated; ensure the signature change references emit_string_escape_warnings and adjust call sites in the compile flow to propagate the parsed AST/cache.crates/common/src/format.rs (1)
427-441:get_separator_intervalwill panic onUnknown(_)if reached.The
_ => panic!(...)catch-all on line 439 now implicitly coversFormatType::Unknown(_). Currently this is unreachable because all formatting methods (format_int,format_float, etc.) returnErr(UnknownFormatCode)before callingadd_magnitude_separators. However, it's a latent footgun — any future code path that reachesget_separator_intervalwith anUnknowntype will hit a runtime panic.Consider adding
FormatType::Unknown(_)explicitly to the panic arm (or returning an error instead of panicking).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/common/src/format.rs` around lines 427 - 441, get_separator_interval currently uses a catch-all _ => panic!(...) which implicitly includes FormatType::Unknown(_); make the match exhaustive by adding an explicit FormatType::Unknown(_) arm that panics with the same message (or alternatively change get_separator_interval to return a Result and propagate an error for Unknown), so update the match in get_separator_interval to include FormatType::Unknown(_) => panic!("Separators only valid for numbers!") and keep the final arm only for true unreachable cases to avoid implicitly panicking on Unknown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/vm/compile.rs`:
- Around line 28-35: The emit_string_escape_warnings call is using the wrong
parse mode; forward the compile mode into it so it re-parses with the same Mode
as the original compile. Modify the call site in compile (where
compiler::compile(source, mode, &source_path, opts) is invoked) to pass the mode
into emit_string_escape_warnings, and update the emit_string_escape_warnings
signature and implementation to accept a Mode parameter and use that Mode when
re-parsing the source; update any other callers of emit_string_escape_warnings
accordingly to supply the mode.
---
Duplicate comments:
In `@crates/vm/src/vm/compile.rs`:
- Around line 189-199: The call to warn::warn_explicit in compile.rs currently
discards its Result with `let _ =`, which prevents `-W error::SyntaxWarning`
from becoming a hard error; update the call site that invokes
warn::warn_explicit (the invocation inside the compile path using
vm.ctx.exceptions.syntax_warning) to retain the Result handling and add a clear
TODO comment explaining that warning-as-error is intentionally not propagated
here for now and must be revisited to match CPython behavior (i.e., either
propagate the error or convert warnings to errors when -W is set); reference the
warn::warn_explicit call and the surrounding compile logic so reviewers can
later replace the TODO with proper error propagation.
---
Nitpick comments:
In `@crates/common/src/format.rs`:
- Around line 427-441: get_separator_interval currently uses a catch-all _ =>
panic!(...) which implicitly includes FormatType::Unknown(_); make the match
exhaustive by adding an explicit FormatType::Unknown(_) arm that panics with the
same message (or alternatively change get_separator_interval to return a Result
and propagate an error for Unknown), so update the match in
get_separator_interval to include FormatType::Unknown(_) => panic!("Separators
only valid for numbers!") and keep the final arm only for true unreachable cases
to avoid implicitly panicking on Unknown.
In `@crates/vm/src/vm/compile.rs`:
- Around line 337-342: The emit_string_escape_warnings function currently
reparses source via ruff_python_parser::parse, doubling work; modify
emit_string_escape_warnings to accept an AST (or a reference to the parsed
Module) produced by the initial compile pass (or take an Option<&Module> and
fall back to parsing only if None), and update callers in the compile pipeline
to pass the already-parsed AST from the initial parse step so the second parse
is eliminated; ensure the signature change references
emit_string_escape_warnings and adjust call sites in the compile flow to
propagate the parsed AST/cache.
In `@crates/vm/src/vm/vm_new.rs`:
- Around line 603-608: The match arm handling
crate::compiler::CompileError::Parse that destructures
ruff_python_parser::ParseErrorType::OtherError(s) uses
s.eq_ignore_ascii_case("bytes literal cannot be mixed with non-bytes literals");
change this to a strict equality comparison (s == "bytes literal cannot be mixed
with non-bytes literals") to avoid masking casing changes and marginally improve
performance—update the match condition in vm_new.rs accordingly.
| let code = | ||
| compiler::compile(source, mode, &source_path, opts).map(|code| self.ctx.new_code(code)); | ||
| #[cfg(feature = "parser")] | ||
| if code.is_ok() { | ||
| self.emit_string_escape_warnings(source, &source_path); | ||
| } | ||
| code | ||
| } |
There was a problem hiding this comment.
Compile mode not forwarded to escape-warning re-parse.
emit_string_escape_warnings always parses as Mode::Module (line 339), but the original compilation may use Mode::Expression (or another mode). An expression like eval("'\\q'") would be parsed as a module here, which could silently produce a different AST or fail outright — causing the warning to be skipped for eval-mode compiles.
Consider forwarding the mode parameter:
Suggested fix
- #[cfg(feature = "parser")]
- if code.is_ok() {
- self.emit_string_escape_warnings(source, &source_path);
- }
+ #[cfg(feature = "parser")]
+ if code.is_ok() {
+ self.emit_string_escape_warnings(source, &source_path, mode);
+ }And update emit_string_escape_warnings to accept and use the mode:
- pub(super) fn emit_string_escape_warnings(&self, source: &str, filename: &str) {
- let Ok(parsed) =
- ruff_python_parser::parse(source, ruff_python_parser::Mode::Module.into())
+ pub(super) fn emit_string_escape_warnings(&self, source: &str, filename: &str, mode: compiler::Mode) {
+ let parse_mode = match mode {
+ compiler::Mode::Eval => ruff_python_parser::Mode::Expression,
+ _ => ruff_python_parser::Mode::Module,
+ };
+ let Ok(parsed) =
+ ruff_python_parser::parse(source, parse_mode.into())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/vm/compile.rs` around lines 28 - 35, The
emit_string_escape_warnings call is using the wrong parse mode; forward the
compile mode into it so it re-parses with the same Mode as the original compile.
Modify the call site in compile (where compiler::compile(source, mode,
&source_path, opts) is invoked) to pass the mode into
emit_string_escape_warnings, and update the emit_string_escape_warnings
signature and implementation to accept a Mode parameter and use that Mode when
re-parsing the source; update any other callers of emit_string_escape_warnings
accordingly to supply the mode.
Summary by CodeRabbit
New Features
Bug Fixes