Fix test_import: import machinery, circular imports, and script shadowing#7034
Fix test_import: import machinery, circular imports, and script shadowing#7034youknowone merged 2 commits intoRustPython:mainfrom
Conversation
📝 WalkthroughWalkthroughRefactors import machinery and dotted-import codegen: adds importlib wiring and helpers for origin/spec, stdlib-shadow detection, richer ImportError/AttributeError messages, enhanced frozen-module aliasing and frozen-code handling, plus codegen changes to expand dotted imports with aliases. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Builtins as "__import__ (builtins)"
participant Import as "import_module_level"
participant Importlib as "vm.importlib"
participant Sys as "sys.modules / _find_and_load"
participant Frozen as "frozen lookup"
Caller->>Builtins: __import__(name, globals, locals, fromlist, level)
Builtins->>Import: import_module_level(name, globals, fromlist, level)
Import->>Importlib: consult importlib / __spec__ (origin)
Import->>Sys: check sys.modules for existing module
alt found in sys.modules
Sys-->>Import: module (may be initializing)
Import->>Caller: return module or raise partial-init ImportError (with origin/path)
else not found
Import->>Frozen: check frozen modules & aliases
Frozen-->>Import: frozen module or not found
alt module found
Import->>Importlib: load/execute module, set __spec__/__loader__
Import->>Caller: return module
else not found
Import->>Caller: raise ImportError (with origin/path metadata)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin test_import |
61d81a1 to
68aa709
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)
386-393:⚠️ Potential issue | 🟡 Minor
__dict__fallback callsrequire_strwhich raisesTypeErroron non-string keys — CPython only issues aDeprecationWarningand skips.In CPython's
import_all_from, when iterating__dict__(no__all__), non-string keys are skipped with aDeprecationWarning, not a hardTypeError. This code would raise where CPython would continue. If strict CPython compatibility is desired fortest_import, consider filtering/skipping non-string keys instead.
🤖 Fix all issues with AI agents
In `@crates/vm/src/import.rs`:
- Around line 347-471: The has_from truthiness check in import_module_level
currently swallows errors from fl.clone().try_to_bool(vm) via .ok(), so any
exception from __bool__ is ignored; change the logic around the fromlist
handling (the has_from computation) to call try_to_bool(vm) directly and
propagate any Err returned (instead of mapping to Option via .ok()), so that
errors from __bool__ bubble up as PyErrs; locate the has_from block around the
fromlist variable and replace the .and_then(|fl|
fl.clone().try_to_bool(vm).ok()) pattern with code that handles try_to_bool(vm)
-> Result<bool, PyErr> and returns Err when it fails.
In `@crates/vm/src/stdlib/imp.rs`:
- Around line 286-309: The unsafe cast in update_code_filenames that writes to
code.code.source_path must be replaced by safe interior mutability: change the
CodeObject field source_path to an interior-mutable type (e.g., Cell<&'static
PyStrInterned> or an AtomicPtr-backed wrapper) and update its initialization
sites accordingly, then modify update_code_filenames to call the safe setter
(e.g., code.code.source_path.set(new_name) or swap the pointer atomically)
instead of performing the raw pointer write; ensure recursive traversal (the
loop calling update_code_filenames on nested PyCode) and any synchronization via
existing VM locks (IMP_LOCK) remain correct.
🧹 Nitpick comments (4)
crates/vm/src/vm/interpreter.rs (1)
466-521: Consider extracting a small helper to reduce repetition in alias injection.The find-then-push pattern is repeated three times with minor variations. A small closure or helper could reduce the boilerplate while keeping intent clear.
♻️ Suggested refactor
// Collect and add frozen module aliases for test modules let mut entries: Vec<_> = iter.collect(); + + let find_code = |entries: &[(&'static str, FrozenModule)], name: &str| { + entries.iter().find(|(n, _)| *n == name).map(|(_, m)| m.code) + }; + - if let Some(hello_code) = entries - .iter() - .find(|(n, _)| *n == "__hello__") - .map(|(_, m)| m.code) - { + if let Some(hello_code) = find_code(&entries, "__hello__") { entries.push(("__hello_alias__", FrozenModule { code: hello_code, package: false })); entries.push(("__phello_alias__", FrozenModule { code: hello_code, package: true })); entries.push(("__phello_alias__.spam", FrozenModule { code: hello_code, package: false })); } - if let Some(code) = entries - .iter() - .find(|(n, _)| *n == "__phello__") - .map(|(_, m)| m.code) - { + if let Some(code) = find_code(&entries, "__phello__") { entries.push(("__phello__.__init__", FrozenModule { code, package: false })); } - if let Some(code) = entries - .iter() - .find(|(n, _)| *n == "__phello__.ham") - .map(|(_, m)| m.code) - { + if let Some(code) = find_code(&entries, "__phello__.ham") { entries.push(("__phello__.ham.__init__", FrozenModule { code, package: false })); }crates/vm/src/import.rs (1)
242-265: PreferOption<&PyObjectRef>over&Option<PyObjectRef>.Clippy flags
&Option<T>in function signatures (clippy::ref_option). UsingOption<&PyObjectRef>is more idiomatic—it avoids an unnecessary indirection and callers can simply passspec.as_ref().♻️ Proposed fix
-pub(crate) fn get_spec_file_origin( - spec: &Option<PyObjectRef>, - vm: &VirtualMachine, -) -> Option<String> { - let spec = spec.as_ref()?; +pub(crate) fn get_spec_file_origin( + spec: Option<&PyObjectRef>, + vm: &VirtualMachine, +) -> Option<String> { + let spec = spec?;As per coding guidelines, "Always run clippy to lint Rust code (
cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes."crates/vm/src/frame.rs (1)
3298-3309: Unnecessary per-call string allocation — prefer&stror interned identifiers.
vm.ctx.new_str("__spec__")andvm.ctx.new_str("_initializing")allocate newPyStrobjects every timeis_module_initializingis called. Elsewhere in this file (e.g., line 278), raw&stris passed directly toget_attr. The same pattern works here.♻️ Proposed fix
fn is_module_initializing(module: &PyObject, vm: &VirtualMachine) -> bool { - let Ok(spec) = module.get_attr(&vm.ctx.new_str("__spec__"), vm) else { + let Ok(spec) = module.get_attr("__spec__", vm) else { return false; }; if vm.is_none(&spec) { return false; } - let Ok(initializing_attr) = spec.get_attr(&vm.ctx.new_str("_initializing"), vm) else { + let Ok(initializing_attr) = spec.get_attr("_initializing", vm) else { return false; }; initializing_attr.try_to_bool(vm).unwrap_or(false) }crates/vm/src/stdlib/imp.rs (1)
254-254: Return type readability: consider a type alias for this complex tuple.
Option<(Option<PyRef<PyMemoryView>>, bool, Option<PyStrRef>)>is hard to parse at a glance. A named struct (even file-local) would improve readability, though this is non-blocking.
| fn update_code_filenames( | ||
| code: &PyCode, | ||
| old_name: &'static PyStrInterned, | ||
| new_name: &'static PyStrInterned, | ||
| ) { | ||
| if !core::ptr::eq(code.code.source_path, old_name) | ||
| && code.code.source_path.as_str() != old_name.as_str() | ||
| { | ||
| return; | ||
| } | ||
| // SAFETY: called during import before the code object is shared. | ||
| // Mutates co_filename in place. | ||
| #[allow(invalid_reference_casting)] | ||
| unsafe { | ||
| let source_path_ptr = &code.code.source_path as *const _ as *mut &'static PyStrInterned; | ||
| core::ptr::write_volatile(source_path_ptr, new_name); | ||
| } | ||
| for constant in code.code.constants.iter() { | ||
| let obj: &crate::PyObject = constant.borrow(); | ||
| if let Some(inner_code) = obj.downcast_ref::<PyCode>() { | ||
| update_code_filenames(inner_code, old_name, new_name); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's find the CodeObject struct definition to see how source_path is declared
rg -n "struct CodeObject" -A20 --type=rsRepository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Check how source_path is declared in CodeObject
ast-grep --pattern 'pub source_path: $_'Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check all references to source_path to understand usage patterns
rg -n "source_path" --type=rs | head -50Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Look for threading or cross-thread usage patterns
rg -n "thread|Mutex|Arc|RwLock" crates/vm/src/stdlib/imp.rs -B2 -A2Repository: RustPython/RustPython
Length of output: 416
🏁 Script executed:
#!/bin/bash
# Check if there are existing interior mutability patterns in the codebase
rg -n "Cell<|AtomicPtr|Mutex<|RwLock<" --type=rs crates/vm/src | head -30Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
#!/bin/bash
# Find the CodeObject struct definition
rg -n "struct CodeObject" --max-count=5 -A15Repository: RustPython/RustPython
Length of output: 1341
🏁 Script executed:
#!/bin/bash
# Search for source_path field declaration more broadly
rg -n "source_path:" | head -20Repository: RustPython/RustPython
Length of output: 1826
🏁 Script executed:
#!/bin/bash
# Look at the imp.rs file specifically around the problematic function
head -n 320 crates/vm/src/stdlib/imp.rs | tail -n 60Repository: RustPython/RustPython
Length of output: 2131
🏁 Script executed:
#!/bin/bash
# Find where CodeObject is defined
find . -name "*.rs" | xargs grep -l "struct CodeObject" | head -5Repository: RustPython/RustPython
Length of output: 103
🏁 Script executed:
#!/bin/bash
# Check for interior mutability patterns already used in the codebase
rg "Cell<|AtomicPtr|Mutex|RwLock" | grep -i "source\|code" | head -10Repository: RustPython/RustPython
Length of output: 929
Refactor source_path mutation to use interior mutability instead of unsafe casting.
The unsafe code that casts &code.code.source_path to *mut and writes through it violates Rust's aliasing rules. Casting away const-ness and dereferencing is undefined behavior, regardless of the timing assumption that "the code object hasn't been shared yet." The comment in update_code_filenames relies on an untestable guarantee about when the function is called.
Since PyCode exists in a reference-counted VM context where multiple references can exist, and since the codebase already uses interior mutability patterns (e.g., PyMutex, PyRwLock for similar fields), refactor source_path to use Cell<&'static PyStrInterned> or AtomicPtr to safely mutate through the existing shared reference. The existing IMP_LOCK synchronization can protect updates if needed.
This requires changes to the CodeObject struct in crates/compiler-core/src/bytecode.rs, but is necessary to eliminate undefined behavior.
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/imp.rs` around lines 286 - 309, The unsafe cast in
update_code_filenames that writes to code.code.source_path must be replaced by
safe interior mutability: change the CodeObject field source_path to an
interior-mutable type (e.g., Cell<&'static PyStrInterned> or an AtomicPtr-backed
wrapper) and update its initialization sites accordingly, then modify
update_code_filenames to call the safe setter (e.g.,
code.code.source_path.set(new_name) or swap the pointer atomically) instead of
performing the raw pointer write; ensure recursive traversal (the loop calling
update_code_filenames on nested PyCode) and any synchronization via existing VM
locks (IMP_LOCK) remain correct.
- Emit IMPORT_FROM instead of LOAD_ATTR for `import a.b.c as m` - Add "partially initialized module" error for circular imports - Add "cannot access submodule" error for initializing submodules - Implement script shadowing detection with "consider renaming" hint - Detect user scripts shadowing stdlib/third-party modules - Compute original sys.path[0] from sys.argv[0] - Check sys.stdlib_module_names for stdlib detection - Respect safe_path setting - Implement _imp._fix_co_filename for code source_path rewriting - Add data parameter to _imp.get_frozen_object with marshal deser - Fix import_from: check __spec__.has_location before using origin - Set ImportError.path attribute on import failures - Fix import_star error messages for non-str __all__/__dict__ items - Always call builtins.__import__ in import_inner
68aa709 to
0fc6522
Compare
Previously, try_to_bool errors were silently swallowed via .ok(), causing fromlist with broken __bool__ to default to false.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: (module 'hello_only' not found) [x] test: cpython/Lib/test/test_frozen.py dependencies: dependent tests: (no tests depend on frozen) [ ] test: cpython/Lib/test/test_import (TODO: 3) dependencies: dependent tests: (no tests depend on import) [x] lib: cpython/Lib/importlib dependencies:
dependent tests: (67 tests)
Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/vm/src/import.rs`:
- Around line 288-325: Compute and store the original sys.path[0] once at VM
initialization (in init_importlib_base or VM::new) instead of recomputing it
inside import.rs; add a new VM field (e.g., cached_sys_path_0: Option<String> or
sys_path_0) and initialize it by reading sys_module.argv[0] once using the
existing logic, then update the code in import.rs to read vm.cached_sys_path_0
(falling back to previous behavior or returning false if None) rather than
re-deriving from vm.sys_module.get_attr("argv") on every call; ensure the field
name you add matches uses in import-related functions (e.g., replace local
sys_path_0 computation with vm.cached_sys_path_0) and handle empty-string and
current_dir cases the same as before.
🧹 Nitpick comments (1)
crates/vm/src/import.rs (1)
504-542: Silently swallowingrich_compare_boolerror on line 519 is acceptable for a warning-only path, but worth a brief comment.The
unwrap_or(false)on therich_compare_boolresult means a broken__ne__on a package string silently suppresses the deprecation warning rather than propagating the error. This is intentional (we're inside a best-effort warning block), but a one-line// Ignore comparison errors; this is only advisory.would clarify intent for future readers.
| // Compute original sys.path[0] from sys.argv[0] (the script path). | ||
| // See: config->sys_path_0, which is set once | ||
| // at initialization and never changes even if sys.path is modified. | ||
| let sys_path_0 = (|| -> Option<String> { | ||
| let argv = vm.sys_module.get_attr("argv", vm).ok()?; | ||
| let argv0 = argv.get_item(&0usize, vm).ok()?; | ||
| let argv0_str = argv0.downcast_ref::<PyStr>()?; | ||
| let s = argv0_str.as_str(); | ||
|
|
||
| // For -c and REPL, original sys.path[0] is "" | ||
| if s == "-c" || s.is_empty() { | ||
| return Some(String::new()); | ||
| } | ||
| // For scripts, original sys.path[0] is dirname(argv[0]) | ||
| Some( | ||
| Path::new(s) | ||
| .parent() | ||
| .and_then(|p| p.to_str()) | ||
| .unwrap_or("") | ||
| .to_owned(), | ||
| ) | ||
| })(); | ||
|
|
||
| let sys_path_0 = match sys_path_0 { | ||
| Some(p) => p, | ||
| None => return false, | ||
| }; | ||
|
|
||
| let cmp_path = if sys_path_0.is_empty() { | ||
| match std::env::current_dir() { | ||
| Ok(d) => d.to_string_lossy().to_string(), | ||
| Err(_) => return false, | ||
| } | ||
| } else { | ||
| sys_path_0 | ||
| }; | ||
|
|
||
| root.to_str() == Some(cmp_path.as_str()) |
There was a problem hiding this comment.
sys_path_0 is re-derived from sys.argv[0] on every call instead of being cached at init time.
The comment on lines 289-290 correctly notes that CPython uses config->sys_path_0, which is "set once at initialization and never changes even if sys.path is modified." However, this implementation re-reads sys.argv[0] on each invocation. If user code mutates sys.argv, the shadowing heuristic will silently use the wrong base path.
Consider computing and caching the original sys_path_0 value once during init_importlib_base (or VM init) and storing it on the VM, rather than re-deriving it from sys.argv each time.
🤖 Prompt for AI Agents
In `@crates/vm/src/import.rs` around lines 288 - 325, Compute and store the
original sys.path[0] once at VM initialization (in init_importlib_base or
VM::new) instead of recomputing it inside import.rs; add a new VM field (e.g.,
cached_sys_path_0: Option<String> or sys_path_0) and initialize it by reading
sys_module.argv[0] once using the existing logic, then update the code in
import.rs to read vm.cached_sys_path_0 (falling back to previous behavior or
returning false if None) rather than re-deriving from
vm.sys_module.get_attr("argv") on every call; ensure the field name you add
matches uses in import-related functions (e.g., replace local sys_path_0
computation with vm.cached_sys_path_0) and handle empty-string and current_dir
cases the same as before.
import a.b.c as m(compile.rs)Summary by CodeRabbit
New Features
Bug Fixes