Skip to content

Netty connector hang up after repeated buffer overflow errors when writing data #5753

Closed
@scottoaks17

Description

@scottoaks17

We have some code that repeatedly makes async JAX-RS calls requests using the Netty Connector; the basic call is quite simple:

CompletableFuture<Response> cf = invokeResponse().toCompletableFuture()
                .whenComplete((rsp, t) -> {
                    if (t != null) {
                        System.out.println(Thread.currentThread() + " async complete. Caught exception " + t);
                    }
                })
                .handle((rsp, t) -> {
                    if (rsp != null) {
                        rsp.readEntity(String.class);
                    } else {
                        System.out.println(Thread.currentThread().getName() + " response is null");
                    }
                    return rsp;
                }).exceptionally(t -> {
                    System.out.println("async complete. completed exceptionally " + t);
                    throw new RuntimeException(t);
        });

The attached example program executes these calls within a loop, for a given number of threads. Under normal circumstances, this works fine: all the requests go through and get processed, and the test program exits. Sometimes, however, the test program will hang: it will simply cease processing the remaining results.

We have tracked this down to a problem when the writes to the remote system are buffered within JerseyChunkedInput.write(), which puts the data on a queue of size 8 (by default). If that queue is full, the the write() method throws an IOException. For a single request, this IOException is propagated up correctly; the CompletableFuture reports the exception and processing can continue. After some number of these exceptions, however, the program hangs: the netty stack has somehow lost track of its callbacks/promises.

The easiest way to reproduce this is to modify the JerseyChunkedInput.write() method to periodically throw an IOException, something like this:

private static java.util.concurrent.atomic.AtomicInteger ai = new java.util.concurrent.atomic.AtomicInteger(0);
    private void write(Provider<ByteBuffer> bufferSupplier) throws IOException {

        checkClosed();
    
        if ((ai.getAndIncrement() % 100) == 0) throw new IOException("BOGUS BUFFER OVERFLOW");
        try {
            boolean queued = queue.offer(bufferSupplier.get(), WRITE_TIMEOUT, TimeUnit.MILLISECONDS);
            if (!queued) {
                throw new IOException("Buffer overflow.");
            }

        } catch (InterruptedException e) {
            throw new IOException(e);
        }
    }

WIth that in place, the attached test program will run for awhile. After about 28 calls, it will get quite sluggish, and after about 34 calls it will hang altogether. (Those numbers may likely be different on other systems.

To run the attached program, the pom dependency is

        <dependency>
            <groupId>org.glassfish.jersey.connectors</groupId>
            <artifactId>jersey-netty-connector</artifactId>
            <version>2.45</version>
        </dependency>

Then it needs three arguments: the URL to call, the number of threads, and the number of time each thread should call, so something like:
java <cp> com.oracle.psr.nettybug.NettyBug http://100.105.9.29:7001/console/login/LoginForm.jsp 5 10

nettybug.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions