QueueInputStream reads all but the first byte without waiting. #748

maxxedev · 2025-05-17T06:00:08Z

QueueInputStream reads all but the first byte without waiting.

Fix so that bulk reads avoid getting stuck if a timeout is set and at least one byte is available.

Fix so that bulk reads avoid getting stuck if a timeout is set and at least one byte is available.

ppkarwasz

@maxxedev,

Thanks for the pull request — it adds valuable functionality and includes all the essential elements.

A couple of suggestions and minor nitpicks below.

Note: Have you considered using BlockingQueue.drainTo instead of repeated poll() calls? It locks the queue only once, which could improve performance.
The tradeoff would be representing a byte[] slice as a List<Integer>.

src/main/java/org/apache/commons/io/input/QueueInputStream.java

src/test/java/org/apache/commons/io/input/QueueInputStreamTest.java

src/main/java/org/apache/commons/io/input/QueueInputStream.java

Adds a small benchmark to measure how much time does it take to transfer 1 MiB from a `QueueOutputStream` to a `QueueInputStream`.

ppkarwasz · 2025-05-17T13:39:04Z

I ran a small benchmark (also included in 35d1ce4) that measures the time it takes to read 1 MiB from the queue, while another thread is writing to it. The results look promising.

Before

Benchmark	Mode	Cnt	Score	Error	Units
QueueStreamBenchmark.streams:input	sample	639	78.261	± 1.892	ms/op
QueueStreamBenchmark.streams:input:p0.00	sample		60.293		ms/op
QueueStreamBenchmark.streams:input:p0.50	sample		69.861		ms/op
QueueStreamBenchmark.streams:input:p0.90	sample		96.600		ms/op
QueueStreamBenchmark.streams:input:p0.95	sample		104.858		ms/op
QueueStreamBenchmark.streams:input:p0.99	sample		123.627		ms/op
QueueStreamBenchmark.streams:input:p0.999	sample		156.500		ms/op
QueueStreamBenchmark.streams:input:p0.9999	sample		156.500		ms/op
QueueStreamBenchmark.streams:input:p1.00	sample		156.500		ms/op

After

Note: The high values in some benchmark runs are due to a synchronization problem between the two threads, which led the QueueInputStream thread to time out.

Benchmark	Mode	Cnt	Score	Error	Units
QueueStreamBenchmark.streams:input	sample	9577	21.248	± 5.848	ms/op
QueueStreamBenchmark.streams:input:p0.00	sample		3.650		ms/op
QueueStreamBenchmark.streams:input:p0.50	sample		12.222		ms/op
QueueStreamBenchmark.streams:input:p0.90	sample		12.386		ms/op
QueueStreamBenchmark.streams:input:p0.95	sample		12.468		ms/op
QueueStreamBenchmark.streams:input:p0.99	sample		33.787		ms/op
QueueStreamBenchmark.streams:input:p0.999	sample		2292.976		ms/op
QueueStreamBenchmark.streams:input:p0.9999	sample		8128.561		ms/op
QueueStreamBenchmark.streams:input:p1.00	sample		8128.561		ms/op

ppkarwasz

Thanks for the changes — using BlockingQueue.drainTo significantly reduces lock contention and improves performance ~4x.

We might optimize further by reordering operations. For length > 1, we could:

Try drainTo() first — if it returns elements, skip the next point.
Otherwise, call read() for one byte, then drainTo() for the rest.
Copy data from List<Integer> to the byte array.

src/main/java/org/apache/commons/io/input/QueueInputStream.java

Reduce vertical whitespace

Add missing Javadoc `@param` tag

garydgregory

Hi All,

I've made a few minor changes; see the commit comments.

The remaining major issue is why the new read(...) method ignores the configured timeout some of the time and not some other times?

From a user's POV this is quite confusing and random behavior: I construct an instance with a timeout, I pass it on to other APIs, and sometimes the timeout applies and sometimes it doesn't.

Isn't this bound to be a source of bug reports and confusion?

src/main/java/org/apache/commons/io/input/QueueInputStream.java

ppkarwasz · 2025-05-18T16:12:54Z

The remaining major issue is why the new read(...) method ignores the configured timeout some of the time and not some other times?

From my perspective, the intended contract of the read(...) method is to attempt to read at least one byte from the queue, waiting up to the configured timeout for data to become available. If no data arrives within that period, it should return -1.
As far as I can tell, the current implementation adheres to this contract. It waits for the timeout at most once, not per byte, which ensures consistent and predictable behavior.

In contrast, the previous implementation—inherited from the super class—effectively applied the timeout per byte requested—so if, for example, the caller attempted to read into a buffer of size 1000, the method could block for up to 1000 times the configured timeout. This behavior could easily lead to unexpectedly long delays and violated the "at most the configured timeout" expectation.

maxxedev · 2025-05-18T17:03:07Z

The remaining major issue is why the new read(...) method ignores the configured timeout some of the time and not some other times?

In addition to what @ppkarwasz said above, see these tests that compare behavior of QueueInputStream and FileInputStream as reference. The previous implementation would block for long durations even though there was some data always available. The new implementation and reference implementation don't block if data is available.

Co-authored-by: Piotr P. Karwasz <[email protected]>

src/main/java/org/apache/commons/io/input/QueueInputStream.java

garydgregory · 2025-05-18T18:28:55Z

src/test/java/org/apache/commons/io/input/QueueInputStreamTest.java

+                    queueOutputStream.write(inputData.getBytes(StandardCharsets.UTF_8));
+                    afterWriteLatch.countDown();
+                } catch (final Exception e) {
+                    throw new RuntimeException(e);


Hello @maxxedev
Unless you expect this exception for the test to pass (if yes, then please add a // comment), the test should be clearer and call JUnit's fail(...) method.

This code is executed asynchronously, so any exception it throws will not be captured by the test.

Since, by the end of the test case, this code must execute, it would be useful to:

Assign the return value of CompletableFuture.runAsync() to a variable (e.g. future).

Call assertDoesNotThrow(future::get) at the very end of the test case.

Note: Since other parts of the test count on the afterWriteLatch.countDown() call, if something happens here, the test case will simply hang. I think that we should add some reasonable timeouts to the CountDownLatch.wait() calls.

src/test/java/org/apache/commons/io/input/QueueInputStreamTest.java

src/main/java/org/apache/commons/io/input/QueueInputStream.java

maxxedev · 2025-05-21T20:51:37Z

@garydgregory what needs to be done to get the PR merged?

garydgregory · 2025-05-21T21:22:19Z

I'll take a look again tonight or tomorrow morning.

QueueInputStream reads all but the first byte without waiting.

5a0141c

Fix so that bulk reads avoid getting stuck if a timeout is set and at least one byte is available.

ppkarwasz approved these changes May 17, 2025

View reviewed changes

maxxedev and others added 3 commits May 17, 2025 01:48

Call BlockingQueue::drainTo for improved performance

e919891

Refactor bulk read

7dab4f2

Add benchmark for QueueInputStream/QueueOutputStream

35d1ce4

Adds a small benchmark to measure how much time does it take to transfer 1 MiB from a `QueueOutputStream` to a `QueueInputStream`.

ppkarwasz approved these changes May 17, 2025

View reviewed changes

src/main/java/org/apache/commons/io/input/QueueInputStream.java Outdated Show resolved Hide resolved

maxxedev and others added 5 commits May 17, 2025 12:19

More optimizations on bulk read

a2c080c

Improve test coverage

0a8cace

Add missing Javadoc since tag

b8bee3a

Use final

c6d1b15

Reduce vertical whitespace

Use the exact same invariant checks as the JDK superclass

1ae7833

Add missing Javadoc `@param` tag

garydgregory reviewed May 18, 2025

View reviewed changes

ppkarwasz approved these changes May 18, 2025

View reviewed changes

src/main/java/org/apache/commons/io/input/QueueInputStream.java Outdated Show resolved Hide resolved

Update src/main/java/org/apache/commons/io/input/QueueInputStream.java

8932e77

Co-authored-by: Piotr P. Karwasz <[email protected]>

garydgregory reviewed May 18, 2025

View reviewed changes

src/main/java/org/apache/commons/io/input/QueueInputStream.java Outdated Show resolved Hide resolved

garydgregory requested changes May 18, 2025

View reviewed changes

Update src/main/java/org/apache/commons/io/input/QueueInputStream.java

ffd62cc

garydgregory reviewed May 18, 2025

View reviewed changes

src/test/java/org/apache/commons/io/input/QueueInputStreamTest.java Outdated Show resolved Hide resolved

garydgregory reviewed May 18, 2025

View reviewed changes

src/main/java/org/apache/commons/io/input/QueueInputStream.java Show resolved Hide resolved

maxxedev added 2 commits May 18, 2025 13:27

improve tests

e315a98

improve test

fec059d

garydgregory merged commit fd10fed into apache:master May 22, 2025
19 of 21 checks passed

asf-gitbox-commits pushed a commit that referenced this pull request May 22, 2025

QueueInputStream reads all but the first byte without waiting. #748

b51c106

maxxedev deleted the queueinputstream-bulk-read branch July 13, 2025 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QueueInputStream reads all but the first byte without waiting. #748

QueueInputStream reads all but the first byte without waiting. #748

Uh oh!

maxxedev commented May 17, 2025

Uh oh!

ppkarwasz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ppkarwasz commented May 17, 2025

Uh oh!

ppkarwasz left a comment

Uh oh!

Uh oh!

garydgregory left a comment

Uh oh!

Uh oh!

ppkarwasz commented May 18, 2025

Uh oh!

maxxedev commented May 18, 2025

Uh oh!

Uh oh!

garydgregory May 18, 2025 •

edited

Loading

Uh oh!

ppkarwasz May 18, 2025

Uh oh!

Uh oh!

Uh oh!

maxxedev commented May 21, 2025

Uh oh!

garydgregory commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

QueueInputStream reads all but the first byte without waiting. #748

QueueInputStream reads all but the first byte without waiting. #748

Uh oh!

Conversation

maxxedev commented May 17, 2025

Uh oh!

ppkarwasz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ppkarwasz commented May 17, 2025

Before

After

Uh oh!

ppkarwasz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

garydgregory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ppkarwasz commented May 18, 2025

Uh oh!

maxxedev commented May 18, 2025

Uh oh!

Uh oh!

garydgregory May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ppkarwasz May 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maxxedev commented May 21, 2025

Uh oh!

garydgregory commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

garydgregory May 18, 2025 •

edited

Loading