Skip to content

Implement more ast features#6986

Merged
youknowone merged 10 commits intoRustPython:mainfrom
youknowone:ast
Feb 5, 2026
Merged

Implement more ast features#6986
youknowone merged 10 commits intoRustPython:mainfrom
youknowone:ast

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Feb 3, 2026

Summary by CodeRabbit

  • New Features

    • Enhanced AST module with support for function type nodes and improved validation
    • Added feature version support to the compile() function for better Python compatibility
    • Improved documentation coverage for Python's internal modules and objects
  • Bug Fixes

    • Fixed float integer detection to use exact checks instead of epsilon-based comparisons
    • Enhanced string handling for robustness against invalid character codes
  • Documentation

    • Expanded built-in module documentation with comprehensive entries for internal Python classes and functions

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

This PR refactors the AST system to add comprehensive validation and representation helpers, changes how builtins are stored from PyDictRef to PyObjectRef throughout the frame and function system, expands documentation entries, and enhances string handling for f-strings and template strings with normalization and escape-sequence warnings.

Changes

Cohort / File(s) Summary
Builtins Type System
crates/vm/src/builtins/frame.rs, crates/vm/src/frame.rs, crates/vm/src/builtins/function.rs, crates/vm/src/vm/mod.rs, crates/vm/src/suggestion.rs
Changed builtins storage from PyDictRef to PyObjectRef in Frame and ExecutingFrame; updated PyFunction construction to use frame's builtins directly; modified run_code_obj to dynamically resolve builtins from module.builtins; adjusted load_global_or_builtin to handle both dict and non-dict builtins with fallback paths.
AST Core Implementation
crates/vm/src/stdlib/ast/pyast.rs, crates/vm/src/stdlib/ast/python.rs, crates/vm/src/stdlib/ast/validate.rs, crates/vm/src/stdlib/ast/repr.rs
Introduced impl_base_node! macro for AST node declaration; added comprehensive AST validation module; implemented AST representation helpers with depth-aware truncation; refactored extend_module_nodes to register AST class and populate field types, singletons, and match args; enhanced NodeAst with reduce, replace, and custom initialization logic.
AST Node Type Handling
crates/vm/src/stdlib/ast/basic.rs, crates/vm/src/stdlib/ast/constant.rs, crates/vm/src/stdlib/ast/elif_else_clause.rs, crates/vm/src/stdlib/ast/expression.rs, crates/vm/src/stdlib/ast/module.rs, crates/vm/src/stdlib/ast/operator.rs, crates/vm/src/stdlib/ast/other.rs
Updated string handling to use interned strings in basic.rs and other.rs; reworked frozenset constant representation; adjusted elif/else range computation and clause construction; enhanced expression handling with NodeExprTemplateStr support and dict key/value validation; removed node location additions in module nodes; added _instance early-return paths in operator nodes.
AST Parameters & Patterns
crates/vm/src/stdlib/ast/parameter.rs, crates/vm/src/stdlib/ast/pattern.rs
Updated merge_positional/keyword_parameter_defaults to accept VM, validate argument counts, and return PyResult; enhanced pattern matching to explicitly type kwd attributes/patterns and validate length consistency between kwd_attrs and kwd_patterns.
AST String & Statement Handling
crates/vm/src/stdlib/ast/string.rs, crates/vm/src/stdlib/ast/statement.rs
Introduced f-string/template-string normalization with adjacent constant merging; added escape-sequence validation and warnings; changed TStringInterpolation.format_spec from TemplateStr to JoinedStr; enhanced function/class definition range computation; improved ImportFrom level validation and Pass statement location handling.
Compile & Parser Integration
crates/vm/src/stdlib/ast.rs, crates/vm/src/stdlib/builtins.rs
Expanded parse entry points with optimize, target_version, and type_comments parameters; added ModFunctionType exposure; introduced parse_func_type and wrap_interactive under parser feature; added AST validation in compile(); enhanced error propagation for parse errors; exposed new compile flags (PY_CF_SOURCE_IS_UTF8, PY_CF_IGNORE_COOKIE, PY_CF_ALLOW_TOP_LEVEL_AWAIT).
Documentation & Spelling
.cspell.dict/cpython.txt, crates/doc/generate.py, crates/doc/src/data.inc.rs
Added binop and hexdigit to spelling dictionary; updated doc generation to use strip() instead of inspect.cleandoc(); improved is_child_of module resolution for C modules; massively expanded DB documentation map with Python internal modules, builtin types, and class descriptions.
Numeric & Type Handling
crates/literal/src/float.rs, crates/vm/src/builtins/type.rs
Changed is_integer check in float.rs from epsilon-based tolerance to exact finite check (v.is_finite() && v.fract() == 0.0); refactored PyType.annotations resolution with stricter HEAPTYPE guards and non-dict attribute skipping.
Utility & Platform-Specific
crates/vm/src/stdlib/ctypes/function.rs, crates/vm/src/stdlib/ctypes/simple.rs, crates/vm/src/stdlib/os.rs, extra_tests/snippets/stdlib_types.py
Replaced direct wchar-to-char casts with guarded u32::try_from conversions in ctypes; fixed OS encoding detection to use byte constants; updated test snippet to add Pass statement and remove explicit location metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VM
    participant Frame
    participant Module
    participant Builtins

    Client->>VM: run_code_obj(code_obj, scope)
    VM->>Module: resolve builtins from __builtins__
    alt Module has PyModule __builtins__
        Module->>Builtins: dict()
    else Module has direct builtins
        Module->>Builtins: use as-is
    else No __builtins__
        VM->>Builtins: use vm.builtins.dict()
    end
    VM->>Frame: new(builtins: PyObjectRef)
    Frame->>Client: Frame ready (builtins: PyObjectRef)
    Client->>Frame: load_global_or_builtin(name)
    alt builtins is dict
        Frame->>Builtins: fast path dict lookup
    else builtins is non-dict
        Frame->>Builtins: generic path getattr
    end
    Builtins->>Client: value or error
Loading
sequenceDiagram
    participant Compiler
    participant Parser
    participant Validator
    participant AST_Module

    Compiler->>Parser: parse(source, mode, feature_version)
    Parser->>Parser: apply target_version defaults
    Parser->>AST_Module: emit AST nodes
    alt mode == "single"
        Validator->>AST_Module: wrap_interactive(Module → Interactive)
    end
    Validator->>Validator: validate_mod(module)
    Validator->>Validator: check context consistency, body non-empty, patterns
    alt validation fails
        Validator->>Compiler: PyResult error
    else validation succeeds
        Validator->>Compiler: validated AST
    end
    Compiler->>Compiler: compile to code object
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐰 Builtins dance from dict to object, so flexible and free,
AST nodes validate their forms, a symphony of three,
F-strings merge their secrets, escape sequences show their way,
A rabbit hops through Frame's new paths, bringing order to the fray! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Implement more ast features' is generic and vague, failing to convey specific details about the substantial changes made across the AST module and related components. Consider using a more specific title that highlights the primary changes, such as 'Add Python AST validation, interning, and builtins refactoring' or similar to better reflect the scope of modifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin ast

@youknowone youknowone force-pushed the ast branch 6 times, most recently from 395cffb to 99862cd Compare February 4, 2026 14:45
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/ast.py
[x] lib: cpython/Lib/_ast_unparse.py
[x] test: cpython/Lib/test/test_unparse.py
[x] test: cpython/Lib/test/test_type_comments.py (TODO: 15)

dependencies:

  • ast

dependent tests: (48 tests)

  • ast: test_ast test_builtin test_compile test_dis test_fstring test_future_stmt test_site test_ssl test_type_comments test_ucn test_unparse
    • annotationlib: test_functools test_inspect test_reprlib test_typing
      • inspect: test_abc test_argparse test_asyncgen test_code test_collections test_coroutines test_decimal test_enum test_generators test_grammar test_ntpath test_operator test_patma test_posixpath test_signal test_traceback test_types test_unittest test_yield_from test_zipimport
    • dbm.dumb: test_dbm_dumb
    • inspect:
      • bdb: test_bdb
      • dataclasses: test__colorize test_genericalias test_pprint test_regrtest test_zoneinfo
      • importlib.metadata: test_importlib
      • rlcompleter: test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • pyclbr: test_pyclbr

[x] test: cpython/Lib/test/test_builtin.py (TODO: 25)

dependencies:

dependent tests: (no tests depend on builtin)

[ ] test: cpython/Lib/test/test_funcattrs.py (TODO: 5)

dependencies:

dependent tests: (no tests depend on funcattrs)

[ ] lib: cpython/Lib/traceback.py
[ ] test: cpython/Lib/test/test_traceback.py (TODO: 48)

dependencies:

  • traceback (native: collections.abc, itertools, sys)
    • codeop
    • io (native: _io, _thread, errno, sys)
    • tokenize (native: _tokenize, builtins, itertools, sys)
    • _colorize, contextlib, keyword, linecache, textwrap, warnings

dependent tests: (32 tests)

  • traceback: test_asyncio test_builtin test_code_module test_contextlib test_contextlib_async test_coroutines test_importlib test_listcomps test_pyexpat test_setcomps test_socket test_ssl test_subprocess test_threadedtempfile test_threading test_traceback test_unittest test_with test_zipimport
    • code:
      • sqlite3.main: test_sqlite3
    • concurrent.futures.process: test_concurrent_futures
    • logging.config: test_logging
    • py_compile: test_argparse test_cmd_line_script test_compileall test_importlib test_multiprocessing_main_handling test_py_compile test_runpy
    • pydoc: test_enum
      • xmlrpc.server: test_docxmlrpc test_xmlrpc

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone marked this pull request as ready for review February 5, 2026 04:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/vm/src/frame.rs`:
- Around line 1209-1234: The LOAD_BUILD_CLASS handler currently assumes
__builtins__ is a mapping and calls get_item, which causes an uncaught TypeError
when __builtins__ is a module; change the else branch of the
Instruction::LoadBuildClass handling (the branch after
self.builtins.downcast_ref::<PyDict>()) to detect module-style builtins
(downcast_ref::<PyModule>() or equivalent) and call get_attr(identifier!(vm,
__build_class__), vm) for modules, translating AttributeError into a NameError
the same way you handle KeyError for mappings; keep the existing get_item path
for dict/mapping objects. Apply the same fix to the slow-path global/builtin
name lookup routine (the name-lookup code referenced in the review) so it uses
get_attr for module builtins and preserves existing error translation logic for
mapping builtins.

In `@crates/vm/src/stdlib/ast/string.rs`:
- Around line 50-134: The merge logic currently keeps the first constant's
TextRange when collapsing adjacent string literals, so location info becomes the
first-span-only; update normalize_joined_str_parts and
normalize_template_str_parts so that when you append a new constant (in the
branches handling JoinedStrPart::Constant and TemplateStrPart::Constant) you
also extend the pending range to cover both spans (e.g., set pending.2 =
TextRange::new(pending.start(), constant.range.end()) or use the project's
helper like TextRange::cover/join to set the pending range to span from the
original pending start to constant.range.end), and ensure
push_joined_str_literal and push_template_str_literal use that extended range
when building Constant::new_str; apply the identical change for both joined and
template paths so the merged literal range extends to the last fragment.

In `@crates/vm/src/stdlib/ast/validate.rs`:
- Around line 349-354: The NamedExpr handler only checks that named.target is an
ast::Expr::Name but doesn't validate it in Store context; after the existing
type check for ast::Expr::Named (and the matches!(&*named.target,
ast::Expr::Name(_)) check), call validate_expr(vm, &named.target,
ast::ExprContext::Store) to validate the target as an assignment target and
propagate any error, then validate the value with validate_expr(vm,
&named.value, ast::ExprContext::Load) as already done; keep the existing type
error message if the target isn't a Name.
🧹 Nitpick comments (3)
crates/vm/src/stdlib/ast.rs (2)

398-470: Consider handling edge cases in func_type parsing.

The parse_func_type function has good structure but a few observations:

  1. Lines 405-406: optimize and target_version are unused (explicitly ignored with let _ =). This is intentional but worth noting.

  2. Line 437: When extracting right, the offset split_at + 2 assumes -> is always 2 bytes, which is correct for ASCII but the comment/documentation should clarify this.

  3. Lines 457-461: The match arm for other expression types (other => vec![other]) is a catch-all that may accept invalid func_type syntax silently.

💡 Optional: Add validation for unexpected expression types
     let argtypes: Vec<ast::Expr> = match arg_expr {
         ast::Expr::Tuple(tup) => tup.elts,
         ast::Expr::Name(_) | ast::Expr::Subscript(_) | ast::Expr::Attribute(_) => vec![arg_expr],
-        other => vec![other],
+        // CPython accepts any expression as a type annotation
+        other => vec![other],
     };

472-503: Minor: redundant assignment on Line 487.

The line let tag = if tag.is_empty() { "" } else { tag }; is a no-op since tag already equals itself in both branches.

♻️ Simplify redundant conditional
         let tag = rest.trim_start();
-        let tag = if tag.is_empty() { "" } else { tag };
         let node = NodeAst
crates/vm/src/stdlib/ast/python.rs (1)

433-513: Redundant setup in module_exec.

The module_exec function (lines 440-477) repeats the same setup that extend_class already performs in NodeAst. This appears to be setting up the AST class that's registered in the module, which may be a different type object than the one defined via #[pyattr].

If this duplication is intentional (to handle both the pyattr-registered type and the module-registered type), consider adding a comment explaining why.

💡 Add clarifying comment
     pub(crate) fn module_exec(
         vm: &VirtualMachine,
         module: &Py<crate::builtins::PyModule>,
     ) -> PyResult<()> {
         __module_exec(vm, module);
         super::super::pyast::extend_module_nodes(vm, module);

+        // The AST class registered in extend_module_nodes needs its attributes
+        // and methods configured separately from the #[pyattr] NodeAst type.
         let ast_type = module
             .get_attr("AST", vm)?

Comment on lines 1209 to +1234
Instruction::LoadBuildClass => {
self.push_value(vm.builtins.get_attr(identifier!(vm, __build_class__), vm)?);
let build_class =
if let Some(builtins_dict) = self.builtins.downcast_ref::<PyDict>() {
builtins_dict
.get_item_opt(identifier!(vm, __build_class__), vm)?
.ok_or_else(|| {
vm.new_name_error(
"__build_class__ not found".to_owned(),
identifier!(vm, __build_class__).to_owned(),
)
})?
} else {
self.builtins
.get_item(identifier!(vm, __build_class__), vm)
.map_err(|e| {
if e.fast_isinstance(vm.ctx.exceptions.key_error) {
vm.new_name_error(
"__build_class__ not found".to_owned(),
identifier!(vm, __build_class__).to_owned(),
)
} else {
e
}
})?
};
self.push_value(build_class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd crates/vm && wc -l src/frame.rs

Repository: RustPython/RustPython

Length of output: 82


🏁 Script executed:

cd crates/vm && sed -n '1209,1234p' src/frame.rs

Repository: RustPython/RustPython

Length of output: 1413


🏁 Script executed:

cd crates/vm && sed -n '2151,2169p' src/frame.rs

Repository: RustPython/RustPython

Length of output: 927


🏁 Script executed:

cd crates/vm && rg "get_attr" --type rust -B2 -A2 | head -50

Repository: RustPython/RustPython

Length of output: 2773


🏁 Script executed:

cd crates/vm && rg "module_type" --type rust | head -20

Repository: RustPython/RustPython

Length of output: 680


🏁 Script executed:

cd crates/vm && rg "\.class\(\)" --type rust -B1 -A1 | head -30

Repository: RustPython/RustPython

Length of output: 1642


🏁 Script executed:

cd crates/vm && rg "builtins.*module" --type rust -B2 -A2 | head -40

Repository: RustPython/RustPython

Length of output: 2296


🏁 Script executed:

cd crates/vm && rg "get_item" src/frame.rs -B3 -A3 | head -60

Repository: RustPython/RustPython

Length of output: 2874


🏁 Script executed:

cd crates/vm && grep -n "attribute_error\|key_error" src/frame.rs | head -20

Repository: RustPython/RustPython

Length of output: 1161


🏁 Script executed:

cd crates && find . -name "*.rs" -type f -exec grep -l "PyModule" {} \; | grep -E "builtins|module" | head -5

Repository: RustPython/RustPython

Length of output: 205


🏁 Script executed:

cd crates/vm && sed -n '1,100p' src/builtins/module.rs

Repository: RustPython/RustPython

Length of output: 3283


🏁 Script executed:

cd crates/vm && rg "__getitem__" src/builtins/module.rs -B3 -A5

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates/vm && rg "get_item\|get_attr" src/builtins/module.rs -B2 -A2 | head -40

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates/vm && sed -n '100,300p' src/builtins/module.rs

Repository: RustPython/RustPython

Length of output: 7033


🏁 Script executed:

cd crates/vm && rg "GetAttr\|Initializer" src/builtins/module.rs -B2 -A8 | head -50

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates/vm && python3 << 'EOF'
import sys
# In Python, modules don't support __getitem__
# They support __getattr__ via attribute access
# Check the behavior:
mod = sys
try:
    mod["path"]  # This will fail
except TypeError as e:
    print(f"__getitem__ on module raises: {type(e).__name__}: {e}")

try:
    getattr(mod, "path")  # This works
    print("__getattr__ on module works")
except AttributeError as e:
    print(f"__getattr__ raises: {e}")

try:
    getattr(mod, "nonexistent_attr")
except AttributeError as e:
    print(f"__getattr__ for missing attr raises AttributeError: {type(e).__name__}")
EOF

Repository: RustPython/RustPython

Length of output: 237


🏁 Script executed:

cd crates/vm && rg "fn get_item" src/builtins/*.rs -A5 | head -60

Repository: RustPython/RustPython

Length of output: 1137


🏁 Script executed:

cd crates/vm && rg "object.get_item\|obj.get_item" src/ --type rust -B2 -A3 | head -50

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates && find . -name "*.rs" -type f -exec grep -l "fn get_item" {} \; | head -10

Repository: RustPython/RustPython

Length of output: 205


🏁 Script executed:

cd crates/vm && sed -n '1,100p' src/protocol/object.rs

Repository: RustPython/RustPython

Length of output: 3552


🏁 Script executed:

cd crates/vm && rg "pub fn get_item" src/protocol/object.rs -A10 | head -40

Repository: RustPython/RustPython

Length of output: 530


🏁 Script executed:

cd crates/vm && grep -n "get_item" src/protocol/object.rs | head -20

Repository: RustPython/RustPython

Length of output: 294


🏁 Script executed:

cd crates/vm && sed -n '686,720p' src/protocol/object.rs

Repository: RustPython/RustPython

Length of output: 1504


🏁 Script executed:

cd crates/vm && rg "try_mapping\|try_sequence" src/protocol/object.rs -B2 -A8 | head -50

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cd crates/vm && rg "\.class\(\)\.is\(vm\.ctx\.types\." src/ --type rust -B1 -A1 | head -40

Repository: RustPython/RustPython

Length of output: 2395


🏁 Script executed:

cd crates/vm && rg "get_attr.*vm\)" src/frame.rs -B2 -A2 | head -30

Repository: RustPython/RustPython

Length of output: 1579


🏁 Script executed:

cd crates/vm && cargo clippy --lib 2>&1 | grep -A5 "frame.rs" | head -30

Repository: RustPython/RustPython

Length of output: 151


🏁 Script executed:

cd crates/vm && cargo fmt --check -- src/frame.rs 2>&1 | head -20

Repository: RustPython/RustPython

Length of output: 604


Handle module-style __builtins__ lookup to avoid TypeError.

When __builtins__ is a module (the common case), calling get_item raises TypeError, which is not caught by the code expecting KeyError. This breaks LOAD_BUILD_CLASS and global/builtin name resolution.

For modules, use get_attr instead (which raises AttributeError on missing attributes, translatable to NameError). Keep get_item for dict/mapping objects.

This applies to both the LOAD_BUILD_CLASS instruction (lines 1209-1234) and the slow-path name lookup (lines 2151-2169).

Suggested fix
@@
-                } else {
-                    self.builtins
-                        .get_item(identifier!(vm, __build_class__), vm)
-                        .map_err(|e| {
-                            if e.fast_isinstance(vm.ctx.exceptions.key_error) {
-                                vm.new_name_error(
-                                    "__build_class__ not found".to_owned(),
-                                    identifier!(vm, __build_class__).to_owned(),
-                                )
-                            } else {
-                                e
-                            }
-                        })?
-                };
+                } else if self.builtins.class().is(vm.ctx.types.module_type) {
+                    self.builtins
+                        .get_attr(identifier!(vm, __build_class__), vm)
+                        .map_err(|e| {
+                            if e.fast_isinstance(vm.ctx.exceptions.attribute_error) {
+                                vm.new_name_error(
+                                    "__build_class__ not found".to_owned(),
+                                    identifier!(vm, __build_class__).to_owned(),
+                                )
+                            } else {
+                                e
+                            }
+                        })?
+                } else {
+                    self.builtins
+                        .get_item(identifier!(vm, __build_class__), vm)
+                        .map_err(|e| {
+                            if e.fast_isinstance(vm.ctx.exceptions.key_error) {
+                                vm.new_name_error(
+                                    "__build_class__ not found".to_owned(),
+                                    identifier!(vm, __build_class__).to_owned(),
+                                )
+                            } else {
+                                e
+                            }
+                        })?
+                };
@@
-        } else {
-            // Slow path: builtins is not a dict, use generic __getitem__
-            if let Some(value) = self.globals.get_item_opt(name, vm)? {
-                return Ok(value);
-            }
-            self.builtins.get_item(name, vm).map_err(|e| {
-                if e.fast_isinstance(vm.ctx.exceptions.key_error) {
-                    vm.new_name_error(format!("name '{name}' is not defined"), name.to_owned())
-                } else {
-                    e
-                }
-            })
-        }
+        } else {
+            // Slow path: builtins is not a dict
+            if let Some(value) = self.globals.get_item_opt(name, vm)? {
+                return Ok(value);
+            }
+            if self.builtins.class().is(vm.ctx.types.module_type) {
+                self.builtins.get_attr(name, vm).map_err(|e| {
+                    if e.fast_isinstance(vm.ctx.exceptions.attribute_error) {
+                        vm.new_name_error(format!("name '{name}' is not defined"), name.to_owned())
+                    } else {
+                        e
+                    }
+                })
+            } else {
+                self.builtins.get_item(name, vm).map_err(|e| {
+                    if e.fast_isinstance(vm.ctx.exceptions.key_error) {
+                        vm.new_name_error(format!("name '{name}' is not defined"), name.to_owned())
+                    } else {
+                        e
+                    }
+                })
+            }
+        }
🤖 Prompt for AI Agents
In `@crates/vm/src/frame.rs` around lines 1209 - 1234, The LOAD_BUILD_CLASS
handler currently assumes __builtins__ is a mapping and calls get_item, which
causes an uncaught TypeError when __builtins__ is a module; change the else
branch of the Instruction::LoadBuildClass handling (the branch after
self.builtins.downcast_ref::<PyDict>()) to detect module-style builtins
(downcast_ref::<PyModule>() or equivalent) and call get_attr(identifier!(vm,
__build_class__), vm) for modules, translating AttributeError into a NameError
the same way you handle KeyError for mappings; keep the existing get_item path
for dict/mapping objects. Apply the same fix to the slow-path global/builtin
name lookup routine (the name-lookup code referenced in the review) so it uses
get_attr for module builtins and preserves existing error translation logic for
mapping builtins.

Comment on lines +50 to +134
fn push_joined_str_literal(
output: &mut Vec<JoinedStrPart>,
pending: &mut Option<(String, StringLiteralPrefix, TextRange)>,
) {
if let Some((value, prefix, range)) = pending.take()
&& !value.is_empty()
{
output.push(JoinedStrPart::Constant(Constant::new_str(
value, prefix, range,
)));
}
}

fn normalize_joined_str_parts(values: Vec<JoinedStrPart>) -> Vec<JoinedStrPart> {
let mut output = Vec::with_capacity(values.len());
let mut pending: Option<(String, StringLiteralPrefix, TextRange)> = None;

for part in values {
match part {
JoinedStrPart::Constant(constant) => {
let ConstantLiteral::Str { value, prefix } = constant.value else {
push_joined_str_literal(&mut output, &mut pending);
output.push(JoinedStrPart::Constant(constant));
continue;
};
let value: String = value.into();
if let Some((pending_value, _, _)) = pending.as_mut() {
pending_value.push_str(&value);
} else {
pending = Some((value, prefix, constant.range));
}
}
JoinedStrPart::FormattedValue(value) => {
push_joined_str_literal(&mut output, &mut pending);
output.push(JoinedStrPart::FormattedValue(value));
}
}
}

push_joined_str_literal(&mut output, &mut pending);
output
}

fn push_template_str_literal(
output: &mut Vec<TemplateStrPart>,
pending: &mut Option<(String, StringLiteralPrefix, TextRange)>,
) {
if let Some((value, prefix, range)) = pending.take()
&& !value.is_empty()
{
output.push(TemplateStrPart::Constant(Constant::new_str(
value, prefix, range,
)));
}
}

fn normalize_template_str_parts(values: Vec<TemplateStrPart>) -> Vec<TemplateStrPart> {
let mut output = Vec::with_capacity(values.len());
let mut pending: Option<(String, StringLiteralPrefix, TextRange)> = None;

for part in values {
match part {
TemplateStrPart::Constant(constant) => {
let ConstantLiteral::Str { value, prefix } = constant.value else {
push_template_str_literal(&mut output, &mut pending);
output.push(TemplateStrPart::Constant(constant));
continue;
};
let value: String = value.into();
if let Some((pending_value, _, _)) = pending.as_mut() {
pending_value.push_str(&value);
} else {
pending = Some((value, prefix, constant.range));
}
}
TemplateStrPart::Interpolation(value) => {
push_template_str_literal(&mut output, &mut pending);
output.push(TemplateStrPart::Interpolation(value));
}
}
}

push_template_str_literal(&mut output, &mut pending);
output
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Merged literal ranges should extend to the last fragment.
When adjacent constants are collapsed, the resulting range keeps only the first span; this can skew location info for AST consumers. Consider extending the range while merging.

🛠️ Suggested fix
-                if let Some((pending_value, _, _)) = pending.as_mut() {
-                    pending_value.push_str(&value);
-                } else {
-                    pending = Some((value, prefix, constant.range));
-                }
+                if let Some((pending_value, _, pending_range)) = pending.as_mut() {
+                    pending_value.push_str(&value);
+                    *pending_range = TextRange::new(pending_range.start(), constant.range.end());
+                } else {
+                    pending = Some((value, prefix, constant.range));
+                }
@@
-                if let Some((pending_value, _, _)) = pending.as_mut() {
-                    pending_value.push_str(&value);
-                } else {
-                    pending = Some((value, prefix, constant.range));
-                }
+                if let Some((pending_value, _, pending_range)) = pending.as_mut() {
+                    pending_value.push_str(&value);
+                    *pending_range = TextRange::new(pending_range.start(), constant.range.end());
+                } else {
+                    pending = Some((value, prefix, constant.range));
+                }
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/ast/string.rs` around lines 50 - 134, The merge logic
currently keeps the first constant's TextRange when collapsing adjacent string
literals, so location info becomes the first-span-only; update
normalize_joined_str_parts and normalize_template_str_parts so that when you
append a new constant (in the branches handling JoinedStrPart::Constant and
TemplateStrPart::Constant) you also extend the pending range to cover both spans
(e.g., set pending.2 = TextRange::new(pending.start(), constant.range.end()) or
use the project's helper like TextRange::cover/join to set the pending range to
span from the original pending start to constant.range.end), and ensure
push_joined_str_literal and push_template_str_literal use that extended range
when building Constant::new_str; apply the identical change for both joined and
template paths so the merged literal range extends to the last fragment.

Comment on lines +349 to +354
ast::Expr::Named(named) => {
if !matches!(&*named.target, ast::Expr::Name(_)) {
return Err(vm.new_type_error("NamedExpr target must be a Name".to_owned()));
}
validate_expr(vm, &named.value, ast::ExprContext::Load)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Python AST NamedExpr target Store context validation requirement

💡 Result:

In CPython’s ast, a NamedExpr (the walrus operator) is treated like an assignment target:

  • NamedExpr.target must be an ast.Name node (not Attribute, Subscript, etc.). The compiler’s AST validator explicitly errors otherwise: “NamedExpr target must be a Name”. [3]
  • That ast.Name must be in Store context (because it’s being assigned to). This is what the parser produces, and what you must build if you construct the AST yourself: Name(id='x', ctx=Store()). [1]
  • More generally, assignment targets use Store context (vs Load for reading a value, Del for del). [2]

So, when creating a walrus AST manually, do:

ast.NamedExpr(
    target=ast.Name(id="x", ctx=ast.Store()),
    value=...
)

Validate NamedExpr targets in Store context.

Python's AST validator requires that NamedExpr.target (the walrus operator) be a Name node in Store context, since it represents an assignment target. The current check only enforces the target is a Name, but does not validate the context. Add context validation to reject invalid ASTs:

Proposed fix
         ast::Expr::Named(named) => {
             if !matches!(&*named.target, ast::Expr::Name(_)) {
                 return Err(vm.new_type_error("NamedExpr target must be a Name".to_owned()));
             }
+            validate_expr(vm, &named.target, ast::ExprContext::Store)?;
             validate_expr(vm, &named.value, ast::ExprContext::Load)
         }
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/ast/validate.rs` around lines 349 - 354, The NamedExpr
handler only checks that named.target is an ast::Expr::Name but doesn't validate
it in Store context; after the existing type check for ast::Expr::Named (and the
matches!(&*named.target, ast::Expr::Name(_)) check), call validate_expr(vm,
&named.target, ast::ExprContext::Store) to validate the target as an assignment
target and propagate any error, then validate the value with validate_expr(vm,
&named.value, ast::ExprContext::Load) as already done; keep the existing type
error message if the target isn't a Name.

@youknowone youknowone changed the title ast Implement more ast features Feb 5, 2026
@youknowone youknowone merged commit 684e880 into RustPython:main Feb 5, 2026
14 checks passed
@youknowone youknowone deleted the ast branch February 5, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant