Skip to content

Fix test_import: import machinery, circular imports, and script shadowing#7034

Merged
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:test_import
Feb 8, 2026
Merged

Fix test_import: import machinery, circular imports, and script shadowing#7034
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:test_import

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Feb 7, 2026

  • Emit IMPORT_FROM instead of LOAD_ATTR for import a.b.c as m (compile.rs)
  • Add "partially initialized module" error in module getattr for circular imports
  • Add "cannot access submodule" error for initializing submodules
  • Implement "consider renaming" script shadowing detection (module.rs, frame.rs)
    • Detect when a user script shadows a stdlib or third-party module
    • Compute original sys.path[0] from sys.argv[0] (like CPython's config->sys_path_0)
    • Check sys.stdlib_module_names for stdlib module detection
    • Respect safe_path setting
  • Implement _imp._fix_co_filename to rewrite code object source_path in-place
  • Add data parameter support to _imp.get_frozen_object with marshal deserialization
  • Fix import_from error messages: check spec.has_location before using origin
  • Set ImportError.path attribute on import failures
  • Fix import_star error messages for non-str items in all and dict
  • Always call builtins.import (remove sys.modules cache bypass in import_inner)

Summary by CodeRabbit

  • New Features

    • Better support for importing dotted module paths with aliases.
    • Additional frozen-module aliases so more built-in modules are discoverable.
    • Import system improvements that make more encodings and submodules resolvable at startup.
  • Bug Fixes

    • Richer, more informative import error messages including module origin and stdlib shadowing hints.
    • Improved handling of partially initialized modules and circular-import diagnostics.
    • More robust handling of all during star imports and attribute export resolution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

📝 Walkthrough

Walkthrough

Refactors import machinery and dotted-import codegen: adds importlib wiring and helpers for origin/spec, stdlib-shadow detection, richer ImportError/AttributeError messages, enhanced frozen-module aliasing and frozen-code handling, plus codegen changes to expand dotted imports with aliases.

Changes

Cohort / File(s) Summary
Import core & VM wiring
crates/vm/src/import.rs, crates/vm/src/vm/mod.rs, crates/vm/src/vm/thread.rs
Add import helpers (get_spec_file_origin, is_possibly_shadowing_path, is_stdlib_module_name, import_module_level, calc_package, etc.), add importlib field to VirtualMachine, propagate to thread VMs, and wire importlib into VM init and import flows.
Builtin import / stdlib import helpers
crates/vm/src/stdlib/builtins.rs, crates/vm/src/stdlib/imp.rs
Introduce typed ImportArgs for __import__ and call import_module_level directly; expand frozen-module lookup, add optional deserialization path for frozen code, and add code-filename updating helpers.
Error/messages & module attribute handling
crates/vm/src/frame.rs, crates/vm/src/builtins/module.rs
Compute module origin/spec, detect stdlib-shadowing and uninitialized submodules, attach path metadata to ImportError, and emit context-aware AttributeError/ImportError messages (including circular-import context).
Codegen dotted-import handling
crates/codegen/src/compile.rs
Change handling of dotted imports with aliases to emit per-segment ImportFrom and manage stack with Swap/PopTop so aliasing dotted names preserves evaluation order.
Frozen modules & interpreter aliasing
crates/vm/src/vm/interpreter.rs, crates/vm/src/vm/mod.rs
Add runtime frozen-module alias injections (e.g., __hello_alias__, __phello_alias__, __hello_only__), adjust encoding registration to use import machinery/sys.modules.
Library addition
crates/vm/Lib/python_builtins/__hello_only__.py
Add a small file that references the standard __hello_only__ implementation (acts as a pointer/alias).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant Builtins as "__import__ (builtins)"
    participant Import as "import_module_level"
    participant Importlib as "vm.importlib"
    participant Sys as "sys.modules / _find_and_load"
    participant Frozen as "frozen lookup"

    Caller->>Builtins: __import__(name, globals, locals, fromlist, level)
    Builtins->>Import: import_module_level(name, globals, fromlist, level)
    Import->>Importlib: consult importlib / __spec__ (origin)
    Import->>Sys: check sys.modules for existing module
    alt found in sys.modules
        Sys-->>Import: module (may be initializing)
        Import->>Caller: return module or raise partial-init ImportError (with origin/path)
    else not found
        Import->>Frozen: check frozen modules & aliases
        Frozen-->>Import: frozen module or not found
        alt module found
            Import->>Importlib: load/execute module, set __spec__/__loader__
            Import->>Caller: return module
        else not found
            Import->>Caller: raise ImportError (with origin/path metadata)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

skip:ci

Suggested reviewers

  • ShaharNaveh
  • arihant2math
  • fanninpm

Poem

🐰 I hopped through import paths, one by one,
Unwound dotted names till the job was done.
I chased each spec, traced origin and shade,
Tied aliases neat where tangled names were made.
Now modules greet me — the rabbit's work is fun! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main changes: import machinery fixes, circular import handling, and script shadowing detection across multiple subsystems.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin test_import

@youknowone youknowone force-pushed the test_import branch 10 times, most recently from 61d81a1 to 68aa709 Compare February 8, 2026 14:45
@youknowone youknowone marked this pull request as ready for review February 8, 2026 14:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/frame.rs (1)

386-393: ⚠️ Potential issue | 🟡 Minor

__dict__ fallback calls require_str which raises TypeError on non-string keys — CPython only issues a DeprecationWarning and skips.

In CPython's import_all_from, when iterating __dict__ (no __all__), non-string keys are skipped with a DeprecationWarning, not a hard TypeError. This code would raise where CPython would continue. If strict CPython compatibility is desired for test_import, consider filtering/skipping non-string keys instead.

🤖 Fix all issues with AI agents
In `@crates/vm/src/import.rs`:
- Around line 347-471: The has_from truthiness check in import_module_level
currently swallows errors from fl.clone().try_to_bool(vm) via .ok(), so any
exception from __bool__ is ignored; change the logic around the fromlist
handling (the has_from computation) to call try_to_bool(vm) directly and
propagate any Err returned (instead of mapping to Option via .ok()), so that
errors from __bool__ bubble up as PyErrs; locate the has_from block around the
fromlist variable and replace the .and_then(|fl|
fl.clone().try_to_bool(vm).ok()) pattern with code that handles try_to_bool(vm)
-> Result<bool, PyErr> and returns Err when it fails.

In `@crates/vm/src/stdlib/imp.rs`:
- Around line 286-309: The unsafe cast in update_code_filenames that writes to
code.code.source_path must be replaced by safe interior mutability: change the
CodeObject field source_path to an interior-mutable type (e.g., Cell<&'static
PyStrInterned> or an AtomicPtr-backed wrapper) and update its initialization
sites accordingly, then modify update_code_filenames to call the safe setter
(e.g., code.code.source_path.set(new_name) or swap the pointer atomically)
instead of performing the raw pointer write; ensure recursive traversal (the
loop calling update_code_filenames on nested PyCode) and any synchronization via
existing VM locks (IMP_LOCK) remain correct.
🧹 Nitpick comments (4)
crates/vm/src/vm/interpreter.rs (1)

466-521: Consider extracting a small helper to reduce repetition in alias injection.

The find-then-push pattern is repeated three times with minor variations. A small closure or helper could reduce the boilerplate while keeping intent clear.

♻️ Suggested refactor
     // Collect and add frozen module aliases for test modules
     let mut entries: Vec<_> = iter.collect();
+
+    let find_code = |entries: &[(&'static str, FrozenModule)], name: &str| {
+        entries.iter().find(|(n, _)| *n == name).map(|(_, m)| m.code)
+    };
+
-    if let Some(hello_code) = entries
-        .iter()
-        .find(|(n, _)| *n == "__hello__")
-        .map(|(_, m)| m.code)
-    {
+    if let Some(hello_code) = find_code(&entries, "__hello__") {
         entries.push(("__hello_alias__", FrozenModule { code: hello_code, package: false }));
         entries.push(("__phello_alias__", FrozenModule { code: hello_code, package: true }));
         entries.push(("__phello_alias__.spam", FrozenModule { code: hello_code, package: false }));
     }
-    if let Some(code) = entries
-        .iter()
-        .find(|(n, _)| *n == "__phello__")
-        .map(|(_, m)| m.code)
-    {
+    if let Some(code) = find_code(&entries, "__phello__") {
         entries.push(("__phello__.__init__", FrozenModule { code, package: false }));
     }
-    if let Some(code) = entries
-        .iter()
-        .find(|(n, _)| *n == "__phello__.ham")
-        .map(|(_, m)| m.code)
-    {
+    if let Some(code) = find_code(&entries, "__phello__.ham") {
         entries.push(("__phello__.ham.__init__", FrozenModule { code, package: false }));
     }
crates/vm/src/import.rs (1)

242-265: Prefer Option<&PyObjectRef> over &Option<PyObjectRef>.

Clippy flags &Option<T> in function signatures (clippy::ref_option). Using Option<&PyObjectRef> is more idiomatic—it avoids an unnecessary indirection and callers can simply pass spec.as_ref().

♻️ Proposed fix
-pub(crate) fn get_spec_file_origin(
-    spec: &Option<PyObjectRef>,
-    vm: &VirtualMachine,
-) -> Option<String> {
-    let spec = spec.as_ref()?;
+pub(crate) fn get_spec_file_origin(
+    spec: Option<&PyObjectRef>,
+    vm: &VirtualMachine,
+) -> Option<String> {
+    let spec = spec?;

As per coding guidelines, "Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints that are introduced by your changes."

crates/vm/src/frame.rs (1)

3298-3309: Unnecessary per-call string allocation — prefer &str or interned identifiers.

vm.ctx.new_str("__spec__") and vm.ctx.new_str("_initializing") allocate new PyStr objects every time is_module_initializing is called. Elsewhere in this file (e.g., line 278), raw &str is passed directly to get_attr. The same pattern works here.

♻️ Proposed fix
 fn is_module_initializing(module: &PyObject, vm: &VirtualMachine) -> bool {
-    let Ok(spec) = module.get_attr(&vm.ctx.new_str("__spec__"), vm) else {
+    let Ok(spec) = module.get_attr("__spec__", vm) else {
         return false;
     };
     if vm.is_none(&spec) {
         return false;
     }
-    let Ok(initializing_attr) = spec.get_attr(&vm.ctx.new_str("_initializing"), vm) else {
+    let Ok(initializing_attr) = spec.get_attr("_initializing", vm) else {
         return false;
     };
     initializing_attr.try_to_bool(vm).unwrap_or(false)
 }
crates/vm/src/stdlib/imp.rs (1)

254-254: Return type readability: consider a type alias for this complex tuple.

Option<(Option<PyRef<PyMemoryView>>, bool, Option<PyStrRef>)> is hard to parse at a glance. A named struct (even file-local) would improve readability, though this is non-blocking.

Comment on lines +286 to +309
fn update_code_filenames(
code: &PyCode,
old_name: &'static PyStrInterned,
new_name: &'static PyStrInterned,
) {
if !core::ptr::eq(code.code.source_path, old_name)
&& code.code.source_path.as_str() != old_name.as_str()
{
return;
}
// SAFETY: called during import before the code object is shared.
// Mutates co_filename in place.
#[allow(invalid_reference_casting)]
unsafe {
let source_path_ptr = &code.code.source_path as *const _ as *mut &'static PyStrInterned;
core::ptr::write_volatile(source_path_ptr, new_name);
}
for constant in code.code.constants.iter() {
let obj: &crate::PyObject = constant.borrow();
if let Some(inner_code) = obj.downcast_ref::<PyCode>() {
update_code_filenames(inner_code, old_name, new_name);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find the CodeObject struct definition to see how source_path is declared
rg -n "struct CodeObject" -A20 --type=rs

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Check how source_path is declared in CodeObject
ast-grep --pattern 'pub source_path: $_'

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check all references to source_path to understand usage patterns
rg -n "source_path" --type=rs | head -50

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Look for threading or cross-thread usage patterns
rg -n "thread|Mutex|Arc|RwLock" crates/vm/src/stdlib/imp.rs -B2 -A2

Repository: RustPython/RustPython

Length of output: 416


🏁 Script executed:

#!/bin/bash
# Check if there are existing interior mutability patterns in the codebase
rg -n "Cell<|AtomicPtr|Mutex<|RwLock<" --type=rs crates/vm/src | head -30

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Find the CodeObject struct definition
rg -n "struct CodeObject" --max-count=5 -A15

Repository: RustPython/RustPython

Length of output: 1341


🏁 Script executed:

#!/bin/bash
# Search for source_path field declaration more broadly
rg -n "source_path:" | head -20

Repository: RustPython/RustPython

Length of output: 1826


🏁 Script executed:

#!/bin/bash
# Look at the imp.rs file specifically around the problematic function
head -n 320 crates/vm/src/stdlib/imp.rs | tail -n 60

Repository: RustPython/RustPython

Length of output: 2131


🏁 Script executed:

#!/bin/bash
# Find where CodeObject is defined
find . -name "*.rs" | xargs grep -l "struct CodeObject" | head -5

Repository: RustPython/RustPython

Length of output: 103


🏁 Script executed:

#!/bin/bash
# Check for interior mutability patterns already used in the codebase
rg "Cell<|AtomicPtr|Mutex|RwLock" | grep -i "source\|code" | head -10

Repository: RustPython/RustPython

Length of output: 929


Refactor source_path mutation to use interior mutability instead of unsafe casting.

The unsafe code that casts &code.code.source_path to *mut and writes through it violates Rust's aliasing rules. Casting away const-ness and dereferencing is undefined behavior, regardless of the timing assumption that "the code object hasn't been shared yet." The comment in update_code_filenames relies on an untestable guarantee about when the function is called.

Since PyCode exists in a reference-counted VM context where multiple references can exist, and since the codebase already uses interior mutability patterns (e.g., PyMutex, PyRwLock for similar fields), refactor source_path to use Cell<&'static PyStrInterned> or AtomicPtr to safely mutate through the existing shared reference. The existing IMP_LOCK synchronization can protect updates if needed.

This requires changes to the CodeObject struct in crates/compiler-core/src/bytecode.rs, but is necessary to eliminate undefined behavior.

🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/imp.rs` around lines 286 - 309, The unsafe cast in
update_code_filenames that writes to code.code.source_path must be replaced by
safe interior mutability: change the CodeObject field source_path to an
interior-mutable type (e.g., Cell<&'static PyStrInterned> or an AtomicPtr-backed
wrapper) and update its initialization sites accordingly, then modify
update_code_filenames to call the safe setter (e.g.,
code.code.source_path.set(new_name) or swap the pointer atomically) instead of
performing the raw pointer write; ensure recursive traversal (the loop calling
update_code_filenames on nested PyCode) and any synchronization via existing VM
locks (IMP_LOCK) remain correct.

- Emit IMPORT_FROM instead of LOAD_ATTR for `import a.b.c as m`
- Add "partially initialized module" error for circular imports
- Add "cannot access submodule" error for initializing submodules
- Implement script shadowing detection with "consider renaming" hint
  - Detect user scripts shadowing stdlib/third-party modules
  - Compute original sys.path[0] from sys.argv[0]
  - Check sys.stdlib_module_names for stdlib detection
  - Respect safe_path setting
- Implement _imp._fix_co_filename for code source_path rewriting
- Add data parameter to _imp.get_frozen_object with marshal deser
- Fix import_from: check __spec__.has_location before using origin
- Set ImportError.path attribute on import failures
- Fix import_star error messages for non-str __all__/__dict__ items
- Always call builtins.__import__ in import_inner
Previously, try_to_bool errors were silently swallowed via .ok(),
causing fromlist with broken __bool__ to default to false.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

(module 'hello_only' not found)

[x] test: cpython/Lib/test/test_frozen.py

dependencies:

dependent tests: (no tests depend on frozen)

[ ] test: cpython/Lib/test/test_import (TODO: 3)

dependencies:

dependent tests: (no tests depend on import)

[x] lib: cpython/Lib/importlib
[ ] test: cpython/Lib/test/test_importlib (TODO: 16)

dependencies:

  • importlib

dependent tests: (67 tests)

  • importlib: test_bdb test_cmd_line_script test_codecs test_compileall test_ctypes test_doctest test_frozen test_hashlib test_importlib test_inspect test_linecache test_multiprocessing_main_handling test_pkgutil test_py_compile test_reprlib test_runpy test_sundry test_support test_tomllib test_unittest test_zipfile test_zipimport test_zoneinfo
    • ensurepip: test_ensurepip test_venv
    • inspect: test_abc test_argparse test_asyncgen test_builtin test_code test_collections test_coroutines test_decimal test_enum test_functools test_generators test_grammar test_ntpath test_operator test_patma test_posixpath test_signal test_traceback test_types test_typing test_unittest test_yield_from
      • asyncio: test_asyncio test_contextlib_async test_logging test_os test_sys_settrace test_unittest
      • dataclasses: test__colorize test_genericalias test_pprint test_regrtest
      • rlcompleter: test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • py_compile: test_importlib
    • pyclbr: test_pyclbr
    • zipfile: test_shutil test_zipapp test_zipfile test_zipfile64

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/vm/src/import.rs`:
- Around line 288-325: Compute and store the original sys.path[0] once at VM
initialization (in init_importlib_base or VM::new) instead of recomputing it
inside import.rs; add a new VM field (e.g., cached_sys_path_0: Option<String> or
sys_path_0) and initialize it by reading sys_module.argv[0] once using the
existing logic, then update the code in import.rs to read vm.cached_sys_path_0
(falling back to previous behavior or returning false if None) rather than
re-deriving from vm.sys_module.get_attr("argv") on every call; ensure the field
name you add matches uses in import-related functions (e.g., replace local
sys_path_0 computation with vm.cached_sys_path_0) and handle empty-string and
current_dir cases the same as before.
🧹 Nitpick comments (1)
crates/vm/src/import.rs (1)

504-542: Silently swallowing rich_compare_bool error on line 519 is acceptable for a warning-only path, but worth a brief comment.

The unwrap_or(false) on the rich_compare_bool result means a broken __ne__ on a package string silently suppresses the deprecation warning rather than propagating the error. This is intentional (we're inside a best-effort warning block), but a one-line // Ignore comparison errors; this is only advisory. would clarify intent for future readers.

Comment on lines +288 to +325
// Compute original sys.path[0] from sys.argv[0] (the script path).
// See: config->sys_path_0, which is set once
// at initialization and never changes even if sys.path is modified.
let sys_path_0 = (|| -> Option<String> {
let argv = vm.sys_module.get_attr("argv", vm).ok()?;
let argv0 = argv.get_item(&0usize, vm).ok()?;
let argv0_str = argv0.downcast_ref::<PyStr>()?;
let s = argv0_str.as_str();

// For -c and REPL, original sys.path[0] is ""
if s == "-c" || s.is_empty() {
return Some(String::new());
}
// For scripts, original sys.path[0] is dirname(argv[0])
Some(
Path::new(s)
.parent()
.and_then(|p| p.to_str())
.unwrap_or("")
.to_owned(),
)
})();

let sys_path_0 = match sys_path_0 {
Some(p) => p,
None => return false,
};

let cmp_path = if sys_path_0.is_empty() {
match std::env::current_dir() {
Ok(d) => d.to_string_lossy().to_string(),
Err(_) => return false,
}
} else {
sys_path_0
};

root.to_str() == Some(cmp_path.as_str())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sys_path_0 is re-derived from sys.argv[0] on every call instead of being cached at init time.

The comment on lines 289-290 correctly notes that CPython uses config->sys_path_0, which is "set once at initialization and never changes even if sys.path is modified." However, this implementation re-reads sys.argv[0] on each invocation. If user code mutates sys.argv, the shadowing heuristic will silently use the wrong base path.

Consider computing and caching the original sys_path_0 value once during init_importlib_base (or VM init) and storing it on the VM, rather than re-deriving it from sys.argv each time.

🤖 Prompt for AI Agents
In `@crates/vm/src/import.rs` around lines 288 - 325, Compute and store the
original sys.path[0] once at VM initialization (in init_importlib_base or
VM::new) instead of recomputing it inside import.rs; add a new VM field (e.g.,
cached_sys_path_0: Option<String> or sys_path_0) and initialize it by reading
sys_module.argv[0] once using the existing logic, then update the code in
import.rs to read vm.cached_sys_path_0 (falling back to previous behavior or
returning false if None) rather than re-deriving from
vm.sys_module.get_attr("argv") on every call; ensure the field name you add
matches uses in import-related functions (e.g., replace local sys_path_0
computation with vm.cached_sys_path_0) and handle empty-string and current_dir
cases the same as before.

@youknowone youknowone merged commit 470bd59 into RustPython:main Feb 8, 2026
14 checks passed
@youknowone youknowone deleted the test_import branch February 8, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant