-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Description
Hello, I found in the function clusterLeave()
moby/libnetwork/networkdb/cluster.go
Line 222 in ceefb7d
func (nDB *NetworkDB) clusterLeave() error { |
func (nDB *NetworkDB) clusterLeave() error {
mlist := nDB.memberlist
if err := nDB.sendNodeEvent(NodeEventTypeLeave); err != nil {
log.G(context.TODO()).Errorf("failed to send node leave: %v", err)
}
if err := mlist.Leave(time.Second); err != nil {
return err
}
// cancel the context
nDB.cancelCtx()
for _, t := range nDB.tickers {
t.Stop()
}
return mlist.Shutdown()
}
If the mlist.Leave() return err, the nDB.cancelCtx() below will not get executed.
moby/libnetwork/networkdb/cluster.go
Lines 229 to 234 in ceefb7d
if err := mlist.Leave(time.Second); err != nil { | |
return err | |
} | |
// cancel the context | |
nDB.cancelCtx() |
And it will lead the <-nDB.ctx.Done in triggerFunc() blocked persistently, so the goroutine leak.
moby/libnetwork/networkdb/cluster.go
Line 176 in ceefb7d
go nDB.triggerFunc(trigger.interval, t.C, trigger.fn) |
blocking position:
moby/libnetwork/networkdb/cluster.go
Lines 251 to 258 in ceefb7d
for { | |
select { | |
case <-C: | |
f() | |
case <-nDB.ctx.Done(): | |
return | |
} | |
} |
Reproduce
I reproduce the bug by goleak.
Firstly, I modified the judge condition from err != nil to err == nil. Because I don't know how to let err != nil, the change only to make the return err can be executed easier. I'm not sure whether the change can lead other influences.
Normally:
if err := mlist.Leave(time.Second); err != nil {
return err
}
After modified:
if err := mlist.Leave(time.Second); err == nil {
return err
}
Then I used goleak to test in these test function related the funciton.
moby/libnetwork/networkdb/networkdb_test.go
Line 180 in ceefb7d
func TestNetworkDBSimple(t *testing.T) { |
Like this:

The result shows that there is a bug at the <-nDB.ctx.Done

Expected behavior
No response
docker version
latest
docker info
latest
Additional Info
In short, I think the bug is caused by return but have not called the cancelFunc. I have tried to describe it in detail.