Skip to content

Fix socket.getaddrinfo and upgrade socket to 3.13.11#6564

Merged
youknowone merged 7 commits intoRustPython:mainfrom
youknowone:socket
Dec 29, 2025
Merged

Fix socket.getaddrinfo and upgrade socket to 3.13.11#6564
youknowone merged 7 commits intoRustPython:mainfrom
youknowone:socket

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 28, 2025

Summary by CodeRabbit

  • New Features

    • Added support for CAN (Controller Area Network) and AF_ALG (cryptography) socket families on Linux
    • Extended Windows socket capabilities with socket duplication features
    • Enhanced socket message receiving with improved ancillary data extraction
  • Improvements

    • Added resource warnings for unclosed sockets
    • Improved timeout handling for non-blocking socket operations
    • Enhanced DNS resolution with better hostname and port encoding

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

📝 Walkthrough

Walkthrough

This pull request extends socket functionality by adding Windows WinSock API support, introducing Linux AF_CAN and AF_ALG address family handling, and implementing new socket methods including ioctl, share, sendmsg_afalg, and recvmsg with ancillary data extraction capabilities.

Changes

Cohort / File(s) Summary
Windows WinSock API additions
crates/stdlib/src/socket.rs
Added FROM_PROTOCOL_INFO constant; enhanced ioctl() method to support SIO_RCVALL, SIO_LOOPBACK_FAST_PATH, SIO_KEEPALIVE_VALS with WSAIoctl integration; implemented share() method for socket duplication via WSADuplicateSocketW.
Linux AF\_CAN and AF\_ALG support
crates/stdlib/src/socket.rs
Introduced 18 CAN BCM opcode/flag constants; added SendmsgAfalgArgs struct for AF\_ALG control messages (OP, IV, AEAD\_AssocLen); extended address parsing/construction in multiple code paths (extract_address(), get_addr(), get_addr_tuple()) for sockaddr_can and sockaddr_alg; implemented sendmsg_afalg() method.
Socket lifecycle and data handling
crates/stdlib/src/socket.rs
Added __del__() destructor to emit ResourceWarning on unclosed sockets; implemented recvmsg() with ancillary data extraction; added parse_ancillary_data() helper; adjusted pack_cmsgs_to_send() for correct CMSG\_LEN sizing; masked SOCK_NONBLOCK and SOCK_CLOEXEC during socket type storage to normalize cross-platform behavior.
Address encoding and initialization
crates/stdlib/src/socket.rs
Enhanced getaddrinfo() with IDNA hostname encoding and UTF-8 port encoding; refined socket initialization timeout logic influenced by SOCK_NONBLOCK flag.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • RustPython#6408: Directly modifies PySocket in the same file, adding or changing socket methods and signatures.

Poem

🐰 With sockets spun and protocols deep,
CANbus dreams and AF\_ALG to keep,
Windows APIs dance, Linux takes stage,
Ancillary whispers turn a new page—
Hop along, network friends, the buffers now sing! 🌐

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'Fix socket.getaddrinfo' which aligns with the getaddrinfo enhancements in the PR, but significantly understates the scope. The PR includes major additions for AF_CAN, AF_ALG, Windows socket APIs, ancillary data handling, and more—far beyond just fixing getaddrinfo. Revise the title to better reflect the primary scope, such as 'Add AF_CAN and AF_ALG socket support with enhanced address handling' or 'Expand socket module with Linux CAN/ALG and Windows socket enhancements.'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the socket branch 4 times, most recently from 1b3b15b to 02ae8cc Compare December 28, 2025 13:35
@youknowone youknowone marked this pull request as ready for review December 29, 2025 05:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crates/stdlib/src/socket.rs (2)

146-220: Consider consolidating the cfg attributes.

The repeated #[cfg(target_os = "linux")] on each constant is verbose. Consider grouping them under a single cfg block for maintainability:

#[cfg(target_os = "linux")]
mod can_bcm_constants {
    pub const CAN_BCM_TX_SETUP: i32 = 1;
    // ...
}
#[cfg(target_os = "linux")]
#[pyattr]
use can_bcm_constants::*;

However, if the current style matches the rest of the codebase or is required by the pyattr macro, this is fine as-is.


1891-1906: Potential UB: casting const pointer to mut for msg_iov.

Line 1901 casts iovecs.as_ptr() (const pointer) to *mut _ for msg_iov. While this pattern is common and the kernel won't actually mutate the buffers for sendmsg, it's technically undefined behavior in Rust to cast away constness if the memory is accessed.

Consider using as_mut_ptr() with a mutable binding:

🔎 Suggested fix
-            let iovecs: Vec<libc::iovec> = buffers
+            let mut iovecs: Vec<libc::iovec> = buffers
                 .iter()
                 .map(|buf| libc::iovec {
                     iov_base: buf.as_ptr() as *mut _,
                     iov_len: buf.len(),
                 })
                 .collect();

             // Set up msghdr
             let mut msghdr: libc::msghdr = unsafe { std::mem::zeroed() };
-            msghdr.msg_iov = iovecs.as_ptr() as *mut _;
+            msghdr.msg_iov = iovecs.as_mut_ptr();
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1ff2cd and ed62c8e.

⛔ Files ignored due to path filters (2)
  • Lib/socket.py is excluded by !Lib/**
  • Lib/test/test_socket.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/socket.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/socket.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/socket.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/socket.rs
🔇 Additional comments (12)
crates/stdlib/src/socket.rs (12)

66-89: LGTM!

The Windows WinSock API additions are correctly imported, and FROM_PROTOCOL_INFO = -1 matches the documented WinSock constant value.


991-1054: LGTM!

The logic correctly masks out SOCK_NONBLOCK and SOCK_CLOEXEC flags from the stored socket type, ensuring consistent cross-platform behavior. The timeout initialization properly handles non-blocking sockets by setting timeout to 0.0.


1252-1314: LGTM!

The AF_ALG address handling correctly validates string lengths against the sockaddr_alg field sizes and properly initializes the zeroed structure before copying the type and name strings.


1402-1478: LGTM!

The Windows socket reconstruction from share() bytes is correctly implemented with proper size validation, WSASocketW usage, and error handling. The handle inheritance is properly disabled after creation.


1921-2009: LGTM!

The recvmsg implementation correctly handles buffer allocation, validates input parameters, and properly parses the received data and ancillary data. The return format matches Python's socket.recvmsg() specification.


2011-2045: LGTM!

The parse_ancillary_data function correctly handles control message iteration with proper truncation handling when MSG_CTRUNC occurs. The data length calculation respects both the cmsg_len field and the actual available buffer space.


2088-2093: LGTM!

Correctly using CMSG_LEN for the cmsg_len field (actual message length) while CMSG_SPACE is used for buffer allocation (includes padding). This matches the proper CMSG construction pattern.


2111-2141: LGTM with a note on close() error handling.

The __del__ implementation correctly emits ResourceWarning for unclosed sockets, matching Python's behavior. The close() error is appropriately ignored with let _ = since destructors shouldn't propagate errors.


2305-2408: LGTM!

The Windows ioctl implementation correctly handles the supported IOCTL commands with proper input validation and error handling. The TcpKeepalive struct is correctly marked #[repr(C)] for FFI compatibility.


2410-2433: LGTM!

The Windows share method correctly uses WSADuplicateSocketW to create a shareable socket descriptor and returns the protocol info as bytes, which can be used by another process to recreate the socket.


2801-2825: LGTM!

The getaddrinfo function correctly applies IDNA encoding to hostnames for internationalized domain name support, and properly validates port strings as UTF-8. This matches Python's behavior for hostname encoding.


234-275: These values are correctly defined per RFC 3542 standard and are the appropriate solution for macOS, where they are not available in the libc crate.

Comment on lines +1192 to +1251
#[cfg(target_os = "linux")]
c::AF_CAN => {
let tuple: PyTupleRef = addr.downcast().map_err(|obj| {
vm.new_type_error(format!(
"{}(): AF_CAN address must be tuple, not {}",
caller,
obj.class().name()
))
})?;
if tuple.is_empty() || tuple.len() > 2 {
return Err(vm
.new_type_error(
"AF_CAN address must be a tuple (interface,) or (interface, addr)",
)
.into());
}
let interface: PyStrRef = tuple[0].clone().downcast().map_err(|obj| {
vm.new_type_error(format!(
"{}(): AF_CAN interface must be str, not {}",
caller,
obj.class().name()
))
})?;
let ifname = interface.as_str();

// Get interface index
let ifindex = if ifname.is_empty() {
0 // Bind to all CAN interfaces
} else {
// Check interface name length (IFNAMSIZ is typically 16)
if ifname.len() >= 16 {
return Err(vm
.new_os_error("interface name too long".to_owned())
.into());
}
let cstr = std::ffi::CString::new(ifname)
.map_err(|_| vm.new_os_error("invalid interface name".to_owned()))?;
let idx = unsafe { libc::if_nametoindex(cstr.as_ptr()) };
if idx == 0 {
return Err(io::Error::last_os_error().into());
}
idx as i32
};

// Create sockaddr_can
let mut storage: libc::sockaddr_storage = unsafe { std::mem::zeroed() };
let can_addr =
&mut storage as *mut libc::sockaddr_storage as *mut libc::sockaddr_can;
unsafe {
(*can_addr).can_family = libc::AF_CAN as libc::sa_family_t;
(*can_addr).can_ifindex = ifindex;
}
let storage: socket2::SockAddrStorage = unsafe { std::mem::transmute(storage) };
Ok(unsafe {
socket2::SockAddr::new(
storage,
std::mem::size_of::<libc::sockaddr_can>() as libc::socklen_t,
)
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

AF_CAN: Use libc::IF_NAMESIZE and handle can_id from tuple.

  1. Line 1222: Use libc::IF_NAMESIZE instead of hardcoded 16 for consistency.
  2. The error message says tuple can be (interface, addr) but can_id from tuple[1] is never parsed or used.
🔎 Suggested fix for IF_NAMESIZE
-                    if ifname.len() >= 16 {
+                    if ifname.len() >= libc::IF_NAMESIZE {

Additionally, if the second tuple element (can_id) is intentionally ignored, consider updating the error message to reflect the actual accepted format, or implement proper can_id handling if it's needed for CAN_RAW sockets.

🤖 Prompt for AI Agents
In crates/stdlib/src/socket.rs around lines 1192-1251, replace the hardcoded
interface-name limit 16 with libc::IF_NAMESIZE and handle the optional second
tuple element (can_id) described in the error text: update the length check to
if ifname.len() >= libc::IF_NAMESIZE, and if the tuple has a second element
parse tuple[1] to an integer (map Python int errors to
vm.new_type_error/vm.new_value_error as appropriate), then assign that value
into the sockaddr_can (convert to the correct unsigned integer type and store it
in the can id field) before creating the SockAddr; alternatively, if you intend
to ignore can_id, change the earlier error message to only advertise
(interface,) as the accepted form instead of (interface, addr).

Comment on lines +2535 to +2560
#[cfg(target_os = "linux")]
{
let family = addr.family();
if family == libc::AF_CAN as libc::sa_family_t {
// AF_CAN address: (interface_name,) or (interface_name, can_id)
let can_addr = unsafe { &*(addr.as_ptr() as *const libc::sockaddr_can) };
let ifindex = can_addr.can_ifindex;
let ifname = if ifindex == 0 {
String::new()
} else {
let mut buf = [0u8; libc::IF_NAMESIZE];
let ret = unsafe {
libc::if_indextoname(
ifindex as libc::c_uint,
buf.as_mut_ptr() as *mut libc::c_char,
)
};
if ret.is_null() {
String::new()
} else {
let nul_pos = memchr::memchr(b'\0', &buf).unwrap_or(buf.len());
String::from_utf8_lossy(&buf[..nul_pos]).into_owned()
}
};
return vm.ctx.new_tuple(vec![vm.ctx.new_str(ifname).into()]).into();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find the extract_address function and check its AF_CAN handling
rg -n "fn extract_address" crates/stdlib/src/socket.rs -A 50 | head -80

Repository: RustPython/RustPython

Length of output: 2683


🏁 Script executed:

#!/bin/bash
# Also search for AF_CAN handling in extract_address
rg -n "AF_CAN" crates/stdlib/src/socket.rs -B 3 -A 10

Repository: RustPython/RustPython

Length of output: 4019


🏁 Script executed:

#!/bin/bash
# Check the complete get_addr_tuple function around the mentioned lines
sed -n '2535,2580p' crates/stdlib/src/socket.rs

Repository: RustPython/RustPython

Length of output: 2317


🏁 Script executed:

#!/bin/bash
# Search for any usage of can_id or sockaddr_can fields in the code
rg -n "can_id|sockaddr_can" crates/stdlib/src/socket.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 1164


🏁 Script executed:

#!/bin/bash
# Check if there's any code after line 1215 that handles the second element
sed -n '1215,1240p' crates/stdlib/src/socket.rs

Repository: RustPython/RustPython

Length of output: 1366


🏁 Script executed:

#!/bin/bash
# Search the entire codebase for any handling of the second AF_CAN parameter
rg -n "AF_CAN.*\[1\]|\[1\].*interface" crates/

Repository: RustPython/RustPython

Length of output: 47


🌐 Web query:

CPython socket module AF_CAN address tuple format

💡 Result:

Short answer: use Python socket address tuples.

  • AF_CAN (generic): (interface,) — a 1-tuple with the interface name string (e.g. ("can0",)). Use "" to bind/receive from all CAN interfaces. [1][2]
  • CAN_ISOTP: (interface, rx_addr, tx_addr) — interface string plus two unsigned long CAN IDs. Example: ("can0", 0x123, 0x456). [1][2]
  • CAN_J1939: (interface, name, pgn, addr) — interface string, 64-bit ECU name, 32-bit PGN, 8-bit address. [1][2]

References: Python socket docs (AF_CAN / protocol-specific tuple formats) and Linux sockaddr_can layout. [1][2]


AF_CAN: Tuple validation accepts but silently ignores second element.

extract_address at line 1201 accepts 1 or 2 element tuples with the error message "AF_CAN address must be a tuple (interface,) or (interface, addr)", but only extracts and uses tuple[0] (interface), completely ignoring any second element. This causes silent data loss: users passing a 2-element tuple will have the second value discarded without error.

CPython's AF_CAN (generic/CAN_RAW) accepts only 1-tuple (interface,). Protocol-specific variants like CAN_ISOTP use different tuple formats entirely (3-tuple) rather than a simple (interface, addr) format. Either reject 2-element tuples explicitly or implement full protocol-specific tuple support; do not silently ignore input.

🤖 Prompt for AI Agents
In crates/stdlib/src/socket.rs around lines 2535-2560 (and update
extract_address at ~line 1201), the AF_CAN handling currently accepts 1- or
2-element tuples but only uses tuple[0], silently discarding the second element;
change the validation to accept only 1-element tuples for AF_CAN and return a
clear error when a 2-element tuple is provided (or implement full
protocol-specific tuple parsing if you intend to support other CAN families).
Concretely: in extract_address ensure that for AF_CAN you reject tuples whose
length != 1 with the message "AF_CAN address must be a tuple (interface,)" and
adjust callers/tests accordingly so any provided 2-element tuple raises an error
instead of being ignored.

@youknowone youknowone merged commit 45a1940 into RustPython:main Dec 29, 2025
13 checks passed
@youknowone youknowone deleted the socket branch December 29, 2025 05:26
@coderabbitai coderabbitai bot mentioned this pull request Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant