Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled #15402

serathius · 2023-03-03T12:49:59Z

What happened?

When etcd client is generating high read response load, it can result in watch response stream in the same connection being starved. For example a client with an open watch and running 10 concurrent Range requests each returning around 10MB. The watch might get starved and not get any response for tens of seconds.

Problem does not occur when TLS is not enabled nor when watch is created on separate client/connection.

This affects also K8s (any version) as Kubernetes has one client per resource. Problem will trigger when single K8s resource has a lot of data and there are 10+ concurrent LIST requests for the same resource send to apiserver. For example 10k pods. This can cause serious correctness issues in K8s, like controllers not doing any job as they depend on watch to get updates. For example scheduler not scheduling any pods.

We tested and confirmed that all v3.4+ versions are affected.

Issue affects any watch response type:

Event
Periodic watch progress notification
Manual watch progress notification

What did you expect to happen?

Watch stream should never get blocked no matter the load.

How can we reproduce it (as minimally and precisely as possible)?

Repro for progress notify:

Start etcd with TLS and progress notify, for example:

./bin/etcd   --cert-file ./tests/fixtures/server2.crt --key-file ./tests/fixtures/server2.key.insecure --listen-client-urls https://127.0.0.1:2379 --advertise-client-urls https://127.0.0.1:2379 --experimental-watch-progress-notify-interval=100ms

insert a lot of data into etcd, for example run command below couple of times.

go run ./tools/benchmark/main.go  put --total 10000 --val-size=10000 --key-space-size 10000 --key ./tests/fixtures/server.key.insecure --cacert ./tests/fixtures/ca.crt --cert ./tests/fixtures/server.crt

Run program below

package main

import (
	"context"
	"crypto/tls"
	"fmt"
	"time"

	clientv3 "go.etcd.io/etcd/client/v3"
)

func main() {
	c, err := clientv3.New(clientv3.Config{
		Endpoints: []string{"127.0.0.1:2379"},
		TLS:       &tls.Config{InsecureSkipVerify: true},
	})
	if err != nil {
		panic(err)
	}
	defer c.Close()
	for i := 0; i < 10; i++ {
		go func() {
			for {
				start := time.Now()
				resp, err := c.Get(context.Background(), "", clientv3.WithPrefix())
				if err != nil {
					fmt.Printf("Error: %v\n", err)
				}
				fmt.Printf("Got %d, took %s\n", len(resp.Kvs), time.Since(start))
			}
		}()
	}
	start := time.Now()
	for resp := range c.Watch(context.Background(), "a", clientv3.WithProgressNotify()) {
		fmt.Printf("Got notify, rev: %d, since last: %s\n", resp.Header.Revision, time.Since(start))
		start = time.Now()
	}
}

Instead of getting logs like:

Got notify, rev: 26971, since last: 110.947114ms
Got notify, rev: 26971, since last: 109.55032ms
Got notify, rev: 26971, since last: 109.20894ms
Got notify, rev: 26971, since last: 110.621913ms
Got notify, rev: 26971, since last: 107.954765ms
Got notify, rev: 26971, since last: 110.530701ms
Got notify, rev: 26971, since last: 107.704987ms
Got notify, rev: 26971, since last: 111.342234ms
Got notify, rev: 26971, since last: 107.9304ms
Got notify, rev: 26971, since last: 111.170052ms
Got notify, rev: 26971, since last: 108.656011ms
Got notify, rev: 26971, since last: 107.874665ms
Got notify, rev: 26971, since last: 108.476916ms
Got notify, rev: 26971, since last: 111.891616ms
Got notify, rev: 26971, since last: 107.340694ms
Got notify, rev: 26971, since last: 109.147243ms
Got notify, rev: 26971, since last: 111.340591ms
Got notify, rev: 26971, since last: 106.833842ms
Got notify, rev: 26971, since last: 109.345355ms
Got notify, rev: 26971, since last: 112.012011ms
Got 3783, took 2.237987879s
Got notify, rev: 26971, since last: 109.694461ms
Got 3783, took 2.324323811s
Got notify, rev: 26971, since last: 105.763332ms
Got 3783, took 2.41313437s
Got 3783, took 2.462624343s
Got 3783, took 2.494911293s
Got 3783, took 2.499409447s
Got 3783, took 2.500700171s
Got 3783, took 2.508462692s
Got notify, rev: 26971, since last: 109.242693ms
Got 3783, took 2.515227452s
Got 3783, took 2.534498037s
Got notify, rev: 26971, since last: 121.741006ms
Got notify, rev: 26971, since last: 112.175327ms
Got notify, rev: 26971, since last: 107.779196ms
Got notify, rev: 26971, since last: 98.25659ms
Got notify, rev: 26971, since last: 107.626991ms
Got notify, rev: 26971, since last: 108.487155ms

We get logs like:

Got notify, rev: 26971, since last: 113.603618ms
Got notify, rev: 26971, since last: 107.494278ms
Got notify, rev: 26971, since last: 106.061371ms
Got 3783, took 1.425461398s
Got 3783, took 1.010333393s
Got 3783, took 1.040919851s
Got 3783, took 986.415707ms
Got 3783, took 4.690286601s
Got 3783, took 1.029349011s
Got 3783, took 983.650976ms
Got 3783, took 1.012731026s
Got 3783, took 1.054258714s
Got 3783, took 4.312497715s
Got 3783, took 1.068067416s
Got 3783, took 986.65717ms
Got 3783, took 1.055793164s
Got 3783, took 992.97395ms
Got 3783, took 1.066242731s
Got 3783, took 5.192239047s
Got 3783, took 1.065642174s
Got 3783, took 1.015330689s
Got 3783, took 1.040866735s
Got 3783, took 1.023127374s

And next notify response never comes.

Alternatively run ready e2e test:
5c26ded

Anything else we need to know?

With help of @mborsz I managed to find the root cause:

Problem occurs only when TLS is enabled. This is because grpc handler is served through http server, which doesn't occur otherwise.
http server is affected by net/http: Stream starvation in http2 priority write scheduler golang/go#58804

I have a fix in work in release-3.4...serathius:etcd:fix-watch-starving
Will send PR when ready

Etcd version (please run commands below)

All etcd versions

Etcd configuration (command line flags or environment variables)

Any config with TLS enabled

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

N/A

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius · 2023-03-08T12:09:15Z

Problem definition

There are two issues we want to address related to etcd endpoint setup:

Confirmed issue watch stream starvation. Severity: Critical
Potential issues related to running grpc under http server. Severity: Important.

Watch starvation

Caused by running grpc server under http server which is affected by golang/go#58804.

For the watch starvation we can:

Lessened it by change http server frame scheduler to random.
Fixed by changing to new frame scheduler round-robin algorithm.
Stop running grpc under http server.

Running grpc under http server

Serving grpc server under http it is an experimental API with different performance and features.
It should not be used in production.

To avoid running grpc under http server we can:

Use cmux to multiplex grpc and http server connection. Will result in breaking change.
Use customised connection multiplexer for etcd by @aojea. Not ready for production. Even when matured, there is not a 100% guarantee that will work for all clients.
Move http server to separate port by passing --listen-client-http-urls flag. Complicates configuration, but no worries about production use.

Requirements

No backport of breaking changes.
No backport of code not ready for production.

Proposal

Implement changes and execute backports in the order below:

[v3.4+] Change http server frame scheduler to random.
[v3.4+] Move http server to separate port by passing --listen-client-http-urls.
[v3.6+] Use customised connection multiplexer for etcd by @aojea.
[v3.4+] When ready for production use, change to new frame scheduler round-robin algorithm.

Motivation

Change http server frame scheduler to random with an option to Move http server to separate port is recommended for backporting as only solution that is ready for production use while not causing any breaking changes.
We cannot use cmux to multiplex grpc and http server connection as it will cause a breaking change as proven by etcd connection multiplexing tests.
As confirmed with @aojea we should not backport use customised connection multiplexer for etcd as it is not ready for production.
Using frame scheduler round-robin algorithm should wait until it is verified by Go community for production.

Background

List of etcd http paths:

/debug/vars - Access to etcd config. Available only via --listen-client-urls
/config/local/log - Remote configuration of global log level. Removed in v3.5. Available only via --listen-client-urls
/version - Version endpoint. Available only via --listen-client-urls
/metrics - Metrics endpoint. Available via both --listen-client-urls and --listen-metrics-urls
/health - Health endpoint (same as etdctl endpoint health). Available only via --listen-client-urls.
/v2/* - etcd V2 API. Available only via --listen-client-urls (removed in v3.6)
/v3/* - grpc gateway for V3 API. Allows to call V3 API using http and Json. Available only via --listen-client-urls

EDIT 1: based on @ptabor feedback, I removed --client-no-http flag and proposed backporting --listen-client-http-urls to avoid deprecation.
EDIT 2: Updated to use @aojea custom connection multiplexer as a v3.6+ long term solution.

serathius · 2023-03-08T12:27:41Z

cc @ptabor @ahrtr @spzala @mitake

serathius · 2023-03-08T13:19:38Z

cc @wojtek-t @liggitt

fuweid · 2023-03-09T06:24:39Z

1 [v3.4+] Change the default write scheduler from Priority to Random, which in my testing improves resilience to starvation (not a full immunity).

HI @serathius is it possible to maintain a round-robin scheduler writer? I think it is good for that metrics, which the latency is more predicatable as there is max concurrent streams. The scheduler writer runs without lock. Maybe it could be easier~.

serathius · 2023-03-09T08:10:39Z

1 [v3.4+] Change the default write scheduler from Priority to Random, which in my testing improves resilience to starvation (not a full immunity).

HI @serathius is it possible to maintain a round-robin scheduler writer? I think it is good for that metrics, which the latency is more predicatable as there is max concurrent streams. The scheduler writer runs without lock. Maybe it could be easier~.

Prefer to avoid backporting untested code especially if it would be written by me :P. I would wait for grpc experts to implement it and test it.

ptabor · 2023-03-09T11:32:40Z

LGTM. I would consider also:

log WARNING in v3.4, v3.5 when user is accessing any HTTP (not grpc) service on the listener not defined by --listen-client-http-urls. This will drive proactively users to prepare for v3.6 version.

serathius · 2023-03-09T12:30:15Z

Discussed @fuweid idea with @mborsz. Implementing it based on Random scheduler should be much simpler than fixing golang/go#58804 which will need to modify priority scheduler.

I will try to implement round-robin scheduling. Thanks for suggestion @fuweid

ahrtr · 2023-03-09T21:27:10Z

[v3.4+] Add flag --listen-client-http-urls that exposes http server on separate endpoint and disables non-grpc handlers on --listen-client-urls. This way we expose pure grpc server, full mitigating the issue but requires a user action.

We already have some confusing user facing URLs (e.g. client/peer, listen/advertise, etc.), don't think it's a good idea to add more.
I tend to refactor (*serveCtx) serve to resolve this issue.

fuweid · 2023-03-09T21:35:16Z

@serathius thanks!！！

serathius · 2023-03-10T10:50:54Z

Quick look into implementation of random write scheduler shows that it's isn't so random :P. It prioritizes writing control frames. This also done by priority scheduler. Unfortunately method to check if frame is a control frame is not public https://go.googlesource.com/net/+/master/http2/writesched_random.go#48 (thanks Go).

Meaning that custom implementation of scheduler will already be subpar. I can implement scheduler that treats all frames the same, however I'm not expert in http2, so it's hard to guess what it could break.

EDIT: Or we can guess what it breaks. https://go.googlesource.com/net/+/04296fa82e83b85317bd93ad50dd00460d6d7940%5E%21/http2/writesched_random.go. Nice to see familiar people.

@aojea would love to get your thoughts on this issue.

aojea · 2023-03-10T11:22:53Z

IIUIC the problem here is with DATA frames , those are the ones that can cause starvation.

Control frames are not carrying much data, they are used for signaling, and per RFC they MUST not be considered for flow-control

https://www.rfc-editor.org/rfc/rfc7540

The frame type determines whether flow control applies to a
frame. Of the frames specified in this document, only DATA
frames are subject to flow control; all other frame types do not
consume space in the advertised flow-control window. This
ensures that important control frames are not blocked by flow
control.

One interesting thing is that if you were using the http2 implementation from golang/x/net instead of the one in the standaard library, is that you. probably were using the Random scheduler until last year https://go-review.googlesource.com/c/net/+/382015

Based on this comment from Brad Fitzpatrick https://go-review.googlesource.com/c/net/+/382015/comments/16b761d4_4c27bf6f and that the problem here are DATA frames, I'm +1 on the Random Scheduler if it demonstrates that it solves the problem

fuweid · 2023-03-11T16:19:01Z

It prioritizes writing control frames.

The most enqueued control frame is the connection-level windows update. It's filed by each request body read. But the x/net/http2 only files one connection-level windows update if the pending size >= 4k. Since the ETCD server is using x/net/http2, the number of control frame won't be too much.

Unfortunately method to check if frame is a control frame is not public

Yeah. It is painful. If the isControl can be exported, we can implement a FIFO scheduler.

In the #15446 , it is going to use grpc-go underlying http2. It seems that the grpc-go has some enhancements for the window update, like BDP grpc/grpc-go#2400.

Just my two cents is that if the FIFO scheduler (just like grpc-go http2) is better than random (we need test result), it's worth to make isControl exported.

liggitt · 2023-03-13T14:24:52Z

I'll defer to others with more context about whether splitting http and grpc listeners is necessary, and will just comment on the proposed mechanism for splitting

Making --listen-client-urls mean different things for different versions is very confusing. If I understand the proposal:

--listen-client-urls means http+grpc prior to 3.6 if no --listen-client-http-urls is specified
--listen-client-urls means grpc-only prior to 3.6 if --listen-client-http-urls is specified, or unconditionally in 3.6+

Do all existing clients currently expect to speak grpc over http/2? Will there be any client-visible change or impact of splitting grpc and http? Will existing clients need to change what they specify in their config to target both the http and grpc endpoints?

serathius · 2023-03-13T15:31:02Z

Do all existing clients currently expect to speak grpc over http/2?

Whole V3 etcd API is available on grpc. From pure API stand there is no impact.

Will there be any client-visible change or impact of splitting grpc and http?

As mentioned above /debug/vars, /config/local/log, /version, /metrics, /health, v2 API (removed in v3.6) and grpcgateway for v3 API (grpc over websocket&json).

Will existing clients need to change what they specify in their config to target both the http and grpc endpoints?

Depends on client:

etcd API user will not see any difference. They should not be using V2 API which has been depressed for years and will be removed in v3.6. We also don't expect any users of grpcgateway.
metric collectors will be impacted in v3.6 however exposing etcd API and metrics on the same endpoint is a bad practice as it means that monitoring has permission to access API. We expect most users to be using --listen-metrics-urls.
probers will be impacted in v3.6 as long they are using /health endpoint. However as mentioned we expect most users (including K8s) to be using etcdctl endpoint health.

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius · 2023-03-20T13:59:36Z

Working with @aojea on replacing Cmux, will propose update to the proposal soon. Our main concern is backporting new untested code.

serathius · 2023-03-23T13:55:25Z

Updated the proposal in #15402 (comment)
PTAL @ptabor @ahrtr @liggitt @aojea

tldr; We will still backport running http and grpc on separate port, but instead making it default, we will use @aojea customized connection multiplexer for etcd thus avoiding a breaking change.

ahrtr · 2023-03-24T02:35:35Z

Clarification

If I understand it correctly, the following proposal

Implement changes and execute backports in the order below:
   1. [v3.4+] https://github.com/etcd-io/etcd/pull/15452.
   2. [v3.4+] https://github.com/etcd-io/etcd/pull/15446 by passing --listen-client-http-urls.
   3. [v3.6+] https://github.com/etcd-io/etcd/pull/15510 by @aojea.
   4. [v3.4+] When ready for production use, change to new [frame scheduler round-robin algorithm](https://go-review.git.corp.google.com/c/net/+/478735).

Should be updated as below to avoid confusion & misunderstanding,

Implement changes and execute backports in the order below:
   1. [v3.4+] https://github.com/etcd-io/etcd/pull/15452. 
   2. [Only v3.4 & v3.5] https://github.com/etcd-io/etcd/pull/15446 by passing --listen-client-http-urls.
   3. [v3.6+] https://github.com/etcd-io/etcd/pull/15510 by @aojea.
   4. [v3.6+] When ready for production use, change to new [frame scheduler round-robin algorithm](https://go-review.git.corp.google.com/c/net/+/478735).

It seems that the link frame scheduler round-robin algorithm isn't accessible outside google?

Comment

Please clearly state the impact on K8s. My understanding is that there is no impact if K8s uses etcdctl instead of /health to perform the health check in the Probes. cc @neolit123

I am not sure whether K8s uses other http endpoints. cc @liggitt

[v3.4+] When ready for production use, change to new frame scheduler round-robin algorithm.

Why do you update stable release 3.4 & 3.5 again in future? No matter which way we follow, If there is no any issue, let's keep it as it's.

serathius · 2023-03-24T10:28:12Z

Clarification

If I understand it correctly, the following proposal

Implement changes and execute backports in the order below:
   1. [v3.4+] https://github.com/etcd-io/etcd/pull/15452.
   2. [v3.4+] https://github.com/etcd-io/etcd/pull/15446 by passing --listen-client-http-urls.
   3. [v3.6+] https://github.com/etcd-io/etcd/pull/15510 by @aojea.
   4. [v3.4+] When ready for production use, change to new [frame scheduler round-robin algorithm](https://go-review.git.corp.google.com/c/net/+/478735).

Should be updated as below to avoid confusion & misunderstanding,

Implement changes and execute backports in the order below:
   1. [v3.4+] https://github.com/etcd-io/etcd/pull/15452. 
   2. [Only v3.4 & v3.5] https://github.com/etcd-io/etcd/pull/15446 by passing --listen-client-http-urls.
   3. [v3.6+] https://github.com/etcd-io/etcd/pull/15510 by @aojea.
   4. [v3.6+] When ready for production use, change to new [frame scheduler round-robin algorithm](https://go-review.git.corp.google.com/c/net/+/478735).

No, the proposal above is consistent with the plan.

It seems that the link frame scheduler round-robin algorithm isn't accessible outside google?

Sorry, assumed that it is public as bot linked it to the issue.
golang/go#58804 (comment)

Comment

Please clearly state the impact on K8s. My understanding is that there is no impact if K8s uses etcdctl instead of /health to perform the health check in the Probes. cc @neolit123

With current proposal, there is no impact.

[v3.4+] When ready for production use, change to new frame scheduler round-robin algorithm.

Why do you update stable release 3.4 & 3.5 again in future? No matter which way we follow, If there is no any issue, let's keep it as it's.

Random scheduler is not enough to fully address the issue, so clusters that will not use --listen-client-http-urls will be affected.

ahrtr · 2023-03-24T23:27:02Z

Are you planning to rollback #15446 on main branch when #15510 finishes? So I assume eventually #15446 will only be applied to 3.4 and 3.5.

Since it impacts all existing etcd users, unfortunately no any feedback yet. The good news is K8s itself isn't impacted; but it also affect collecting metrics.

serathius · 2023-03-27T10:18:17Z

Are you planning to rollback #15446 on main branch when #15510 finishes? So I assume eventually #15446 will only be applied to 3.4 and 3.5.

No plan for rollback. As discussed on #15510 (comment) there is no 100% guarantee that multiplexing grpc and http2 will always work for all clients. Whole idea is incorrect and for total safely we should recommend users to separate grpc and http.

Since it impacts all existing etcd users, unfortunately no any feedback yet. The good news is K8s itself isn't impacted; but it also affect collecting metrics.

There is no breaking change for K8s, for etcd users we will use #15479 to confirm that there is no breaking change. Would be good to also cover jetcd, however not sure if that will be easy to automate.

aojea · 2023-03-27T11:57:03Z

No plan for rollback. As discussed on #15510 (comment) there is no 100% guarantee that multiplexing grpc and http2 will always work for all clients. Whole idea is incorrect and for total safely we should recommend users to separate grpc and http.

+1 to this, since grpc uses http2 as the transport,and t also builds some custom transport behaviors on top of it, I think that the best solution is to completely separate them.

cmux or the PR I sent replacing cmux are not completely safe and can't guarantee future compatibility... and in the best of the cases will be always suboptimal since add another layer of buffering

serathius added type/bug priority/important Bugs/PRs about critical functionality labels Mar 3, 2023

serathius self-assigned this Mar 3, 2023

serathius pinned this issue Mar 3, 2023

serathius added release/v3.6 release/v3.5 release/v3.4 labels Mar 3, 2023

wojtek-t mentioned this issue Mar 3, 2023

Create and reuse a single etcd client kubernetes/kubernetes#114458

Open

serathius changed the title ~~Etcd watch stream starvation under high read response load when TLS is enabled~~ Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled Mar 3, 2023

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

ac98d67

serathius mentioned this issue Mar 3, 2023

Fix watch stream starvation #15403

Closed

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

eb82948

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

27f539e

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

e13aa13

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

d0c4ad6

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 3, 2023

Fix etcd-io#15402 by moving grpc server from under http server

a303dce

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 7, 2023

Fix etcd-io#15402 by moving grpc server from under http server

36d5f4b

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 7, 2023

Fix etcd-io#15402 by moving grpc server from under http server

17a73ea

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius mentioned this issue Mar 8, 2023

Watch random scheduler #15431

Merged

This was referenced Mar 13, 2023

Allow user to separate http and grpc server #15446

Merged

etcdserver: create separate goroutine to accept grpc tls connection #15451

Closed

serathius mentioned this issue Mar 13, 2023

[3.5] Use random scheduler #15452

Merged

This was referenced Mar 14, 2023

Watch random scheduler 3.4 #15478

Merged

Test connection multiplexing. #15479

Merged

serathius added a commit to serathius/etcd that referenced this issue Mar 14, 2023

Fix etcd-io#15402 by moving grpc server from under http server

509f8b6

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius added a commit to serathius/etcd that referenced this issue Mar 16, 2023

Fix etcd-io#15402 by moving grpc server from under http server

25a9b56

Signed-off-by: Marek Siarkowicz <[email protected]>

serathius mentioned this issue Mar 16, 2023

Connection multiplexing regression tests for v3.5 #15491

Merged

ahrtr mentioned this issue Mar 19, 2023

Replace Cmux #15510

Open

serathius mentioned this issue Mar 20, 2023

Test cmux 3.4 #15516

Closed

This was referenced Mar 27, 2023

[3.4] Test cmux #15560

Merged

Plans for v3.5.8 release #15585

Open

ahrtr mentioned this issue Apr 22, 2023

Warnings at startup when running with --client-cert-auth and --client-cert-allowed-hostname #15755

Closed

Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled #15402

Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled #15402

serathius commented Mar 3, 2023 •

edited

serathius commented Mar 8, 2023 •

edited

serathius commented Mar 8, 2023

serathius commented Mar 8, 2023

fuweid commented Mar 9, 2023

serathius commented Mar 9, 2023 •

edited

ptabor commented Mar 9, 2023 •

edited by ahrtr

serathius commented Mar 9, 2023

ahrtr commented Mar 9, 2023

fuweid commented Mar 9, 2023

serathius commented Mar 10, 2023 •

edited

aojea commented Mar 10, 2023 •

edited

fuweid commented Mar 11, 2023 •

edited

liggitt commented Mar 13, 2023

serathius commented Mar 13, 2023 •

edited

serathius commented Mar 20, 2023

serathius commented Mar 23, 2023 •

edited

ahrtr commented Mar 24, 2023

serathius commented Mar 24, 2023

Clarification

Comment

ahrtr commented Mar 24, 2023

serathius commented Mar 27, 2023

aojea commented Mar 27, 2023

Mar	APR	May
	24
2022	2023	2024

Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled #15402

Etcd watch stream starvation under high read response load when sharing same connection and TLS is enabled #15402

Comments

serathius commented Mar 3, 2023 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Mar 8, 2023 • edited

Problem definition

Watch starvation

Running grpc under http server

Requirements

Proposal

Motivation

Background

serathius commented Mar 8, 2023

serathius commented Mar 8, 2023

fuweid commented Mar 9, 2023

serathius commented Mar 9, 2023 • edited

ptabor commented Mar 9, 2023 • edited by ahrtr

serathius commented Mar 9, 2023

ahrtr commented Mar 9, 2023

fuweid commented Mar 9, 2023

serathius commented Mar 10, 2023 • edited

aojea commented Mar 10, 2023 • edited

fuweid commented Mar 11, 2023 • edited

liggitt commented Mar 13, 2023

serathius commented Mar 13, 2023 • edited

serathius commented Mar 20, 2023

serathius commented Mar 23, 2023 • edited

ahrtr commented Mar 24, 2023

Clarification

Comment

serathius commented Mar 24, 2023

Clarification

Comment

ahrtr commented Mar 24, 2023

serathius commented Mar 27, 2023

aojea commented Mar 27, 2023

serathius commented Mar 3, 2023 •

edited

serathius commented Mar 8, 2023 •

edited

serathius commented Mar 9, 2023 •

edited

ptabor commented Mar 9, 2023 •

edited by ahrtr

serathius commented Mar 10, 2023 •

edited

aojea commented Mar 10, 2023 •

edited

fuweid commented Mar 11, 2023 •

edited

serathius commented Mar 13, 2023 •

edited

serathius commented Mar 23, 2023 •

edited