Skip to content

Avoid selecting duplicate host ports for mappings to 0.0.0.0 and specific addresses #50054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 4, 2025

Conversation

robmry
Copy link
Contributor

@robmry robmry commented May 22, 2025

- What I did

- How I did it

Two fixes, for related issues ...

Listen on mapped host ports before mapping more ports

Because we set SO_REUSEADDR on sockets for host ports, if there are port mappings for INADDR_ANY (the default) as well as for specific host ports - bind() cannot be used to detect clashes.

That means, for example, on daemon startup, if the port allocator returns the first port in its ephemeral range for a specific host address, and the next port mapping is for 0.0.0.0 - the same port is returned and both bind() calls succeed. Then, the container fails to start later when listen() spots the problem and it's too late to find another port.

So, bind and listen to each set of ports as they're allocated instead of just binding.

This fixes ...

# docker run --rm -ti -p 81/tcp -p 127.0.0.1::80/tcp alpine
docker: Error response from daemon: failed to set up container networking: driver failed programming external connectivity on endpoint wonderful_haibt (0a1a9fe57e84a7d1fd6329e631eab77a3a7f0f8574e2576c8c4b28d891c99ee7): failed to listen on TCP socket: address already in use

It doesn't matter whether the already-listening TCP socket belonged to docker, or some other process running on the host.

portallocator: always check for ports allocated for 0.0.0.0/::

We set SO_REUSEADDR on sockets used for host port mappings by docker-proxy - which means it's possible to bind the same port on a specific address as well as 0.0.0.0/::.

For TCP sockets, an error is raised when listen() is called on both sockets - and the port allocator will be called again to avoid the clash (if the port was allocated from a range, otherwise the container will just fail to start).

But, for UDP sockets, there's no listen() - so take more care to avoid the clash in the portallocator.

The port allocator keeps a set of allocated ports for each of the host IP addresses it's seen, including 0.0.0.0 and [::]. So, if a mapping to 0.0.0.0/:: is requested, find a port that's free in the range for each of the known IP addresses (but still only mark it as allocated against 0.0.0.0/::).

And, if a port is requested for specific host addresses, make sure it's also free in the corresponding 0.0.0.0/:: set (but only mark it as allocated against the specific addresses - because the same port can be allocated against a different specific address).

This fixes ...

# docker run --rm -ti -p 81/udp -p 127.0.0.1::80/udp alpine

# The container started, but the same host port was allocated for both container ports ...
# docker ps
CONTAINER ID   IMAGE     COMMAND     CREATED         STATUS         PORTS                                                                NAMES
191aa620f0d3   alpine    "/bin/sh"   7 seconds ago   Up 7 seconds   127.0.0.1:32768->80/udp, 0.0.0.0:32768->81/udp, [::]:32768->81/udp   zealous_tesla

In this case, if a host process is already using a port allocated by the daemon, the clash will not be detected. (So, we should update docs, adding a reason not to map ports to (the default) 0.0.0.0.)

- How to verify it

With the commands above (against a freshly started daemon).

New and updated unit/integration tests.

Without these changes, the integration test fails with ...

=== RUN   TestMixAnyWithSpecificHostAddrs
=== RUN   TestMixAnyWithSpecificHostAddrs/tcp
    port_mapping_linux_test.go:1297: assertion failed: error is not nil: Error response from daemon: failed to set up container networking: driver failed programming external connectivity on endpoint amazing_driscoll (c818a8d5a619b49d99572cec7e7e7accb0541d67b51314ecb19f7ae1f977ba92): failed to listen on TCP socket: address already in use
=== RUN   TestMixAnyWithSpecificHostAddrs/udp
    port_mapping_linux_test.go:1313: host port 32768 is mapped to different container ports: map[80/udp:[{127.0.0.1 32768}] 81/udp:[{0.0.0.0 32768} {:: 32768}] 82/udp:[{0.0.0.0 32769} {:: 32769}]]
--- FAIL: TestMixAnyWithSpecificHostAddrs (22.36s)
    --- FAIL: TestMixAnyWithSpecificHostAddrs/tcp (9.81s)
    --- FAIL: TestMixAnyWithSpecificHostAddrs/udp (12.55s)
FAIL

- Human readable description for the release notes

- Fix an issue that could cause container startup to fail, or lead to failed UDP port mappings, when some container ports are mapped to `0.0.0.0` and others are mapped to specific host addresses.

robmry added 2 commits May 28, 2025 11:38
Because we set SO_REUSEADDR on sockets for host ports, if there
are port mappings for INADDR_ANY (the default) as well as for
specific host ports - bind() cannot be used to detect clashes.

That means, for example, on daemon startup, if the port allocator
returns the first port in its ephemeral range for a specific host
adddress, and the next port mapping is for 0.0.0.0 - the same port
is returned and both bind() calls succeed. Then, the container
fails to start later when listen() spots the problem and it's too
late to find another port.

So, bind and listen to each set of ports as they're allocated
instead of just binding.

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry force-pushed the fix_port_mapping branch 3 times, most recently from dc41518 to 7bace8c Compare May 28, 2025 12:50
@robmry robmry marked this pull request as ready for review May 28, 2025 12:59
We set SO_REUSEADDR on sockets used for host port mappings by
docker-proxy - which means it's possible to bind the same port
on a specific address as well as 0.0.0.0/::.

For TCP sockets, an error is raised when listen() is called on
both sockets - and the port allocator will be called again to
avoid the clash (if the port was allocated from a range, otherwise
the container will just fail to start).

But, for UDP sockets, there's no listen() - so take more care
to avoid the clash in the portallocator.

The port allocator keeps a set of allocated ports for each of
the host IP addresses it's seen, including 0.0.0.0/::. So, if a
mapping to 0.0.0.0/:: is requested, find a port that's free in
the range for each of the known IP addresses (but still only
mark it as allocated against 0.0.0.0/::). And, if a port is
requested for specific host addresses, make sure it's also
free in the corresponding 0.0.0.0/:: set (but only mark it as
allocated against the specific addresses - because the same
port can be allocated against a different specific address).

Signed-off-by: Rob Murray <[email protected]>
@robmry robmry force-pushed the fix_port_mapping branch from 7bace8c to d662091 Compare May 28, 2025 13:01
@robmry robmry modified the milestones: 29.0.0, 28.2.2 May 29, 2025
Comment on lines +199 to +200
} else if (unspecAddr.Is4() && ips4) || (unspecAddr.Is6() && ips6) {
ipToPortMapRef[unspecAddr] = &portMapRef{portMap: p.ipMap[unspecAddr][proto]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A socket listening on :: will receive both IPv4 and IPv6 packets addressed to the destination port unless the IPV6_V6ONLY sockopt is set. Do we follow the same semantics with port mappings, or does the user have to define two mappings for :: and 0.0.0.0 to publish a port dual-stack? Which address families are mapped when a port mapping is specified in the Engine API request with the HostIP field unset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A socket listening on :: will receive both IPv4 and IPv6 packets addressed to the destination port unless the IPV6_V6ONLY sockopt is set. Do we follow the same semantics with port mappings, or does the user have to define two mappings for :: and 0.0.0.0 to publish a port dual-stack? Which address families are mapped when a port mapping is specified in the Engine API request with the HostIP field unset?

When the host IP in a -p ip:port:port is present, if it's 0.0.0.0 or :: it's single-family.

If not present in the -p, the host address is the value supplied by com.docker.network.bridge.host_binding_ipv4 when the network was created, or --ip for the default bridge. If that's left unset, or it's set to 0.0.0.0, the port mapping is dual-stack. (If it's set to ::, it's IPv6 only.)

The portallocator is always single-stack... so requests to it don't correspond directly to the port mapping specified by the user - the portallocator's caller has to work out which address families it needs the reservation for.

In other words, it's a bit of a mess! But, from the portallocator's perspective it's simple.

As I was working on this, I noticed the daemon doesn't reserve its own port for IPv6, only for IPv4, when it's listening on both - maybe worth fixing separately, but it's not possible to start a container on that port, because the daemon is already listening on it. So, I think it could only cause a problem if a port mapping had a range that included the daemon's port, and the portallocator allocated it when other ports in the range were free ... but that's not going to be a common issue.

@vvoland vvoland modified the milestones: 28.2.2, 28.2.3 May 30, 2025
@robmry robmry merged commit 9663b36 into moby:master Jun 4, 2025
160 checks passed
@robmry robmry deleted the fix_port_mapping branch June 4, 2025 15:51
@vvoland vvoland modified the milestones: 28.2.3, 28.3.0 Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] docker engine and desktop expose random ports differently
3 participants