Jetty Threading Architecture
Writing a performant client or server is difficult, because it should:
-
Scale well with the number of processors.
-
Be efficient at using processor caches to avoid parallel slowdown.
-
Support multiple network protocols that may have very different requirements; for example, multiplexed protocols such as HTTP/2 introduce new challenges that are not present in non-multiplexed protocols such as HTTP/1.1.
-
Support different application threading models; for example, if a Jetty server invokes server-side application code that is allowed to call blocking APIs, then the Jetty server should not be affected by how long the blocking API call takes, and should be able to process other connections or other requests in a timely fashion.
Execution Strategies
The Jetty threading architecture can be modeled with a producer/consumer pattern, where produced tasks needs to be consumed efficiently.
For example, Jetty produces (among others) these tasks:
-
A task that wraps a NIO selection event, see the Jetty I/O architecture.
-
A task that wraps the invocation of application code that may block (for example, the invocation of a Servlet to handle an HTTP request).
A task is typically a Runnable
object that may implement org.eclipse.jetty.util.thread.Invocable
to indicate the behavior of the task (in particular, whether the task may block or not).
Once a task has been produced, it may be consumed using these modes:
Produce-Consume
In the Produce-Consume
mode, the producer thread loops to produce a task that is run directly by the Producer Thread
.
If the task is a NIO selection event, then this mode is the thread-per-selector mode which is very CPU core cache efficient, but suffers from the head-of-line blocking: if one of the tasks blocks or runs slowly, then subsequent tasks cannot be produced (and therefore cannot be consumed either) and will pay in latency the cost of running previous, possibly unrelated, tasks.
This mode should only be used if the produced task is known to never block, or if the system tolerates well (or does not care about) head-of-line blocking.
Produce-Execute-Consume
In the Produce-Execute-Consume
mode, the Producer Thread
loops to produce tasks that are submitted to a java.util.concurrent.Executor
to be run by Worker Thread
s different from the Producer Thread
.
The Executor
implementation typically adds the task to a queue, and dequeues the task when there is a worker thread available to run it.
This mode solves the head-of-line blocking discussed in the Produce-Consume
section, but suffers from other issues:
-
It is not CPU core cache efficient, as the data available to the producer thread will need to be accessed by another thread that likely is going to run on a CPU core that will not have that data in its caches.
-
If the tasks take time to be run, the
Executor
queue may grow indefinitely. -
A small latency is added to every task: the time it waits in the
Executor
queue.
Execute-Produce-Consume
In the Execute-Produce-Consume
mode, the producer thread Thread 1
loops to produce a task, then submits one internal task to an Executor
to take over production on thread Thread 2
, and then runs the task in Thread 1
, and so on.
This mode may operate like Produce-Consume
when the take over production task run, for example, by thread Thread 3
takes time to be executed (for example, in a busy server): then thread Thread 2
will produce one task and run it, then produce another task and run it, etc. — Thread 2
behaves exactly like the Produce-Consume
mode.
By the time thread Thread 3
takes over task production from Thread 2
, all the work might already be done.
This mode may also operate similarly to Produce-Execute-Consume
when the take over production task always finds a free CPU core immediately (for example, in a mostly idle server): thread Thread 1
will produce a task, yield production to Thread 2
while Thread 1
is running the task; Thread 2
will produce a task, yield production to Thread 3
while Thread 2
is running the task, etc.
Differently from Produce-Execute-Consume
, here production happens on different threads, but the advantage is that the task is run by the same thread that produced it (which is CPU core cache efficient).
Adaptive Execution Strategy
The modes of task consumption discussed above are captured by the org.eclipse.jetty.util.thread.ExecutionStrategy
interface, with an additional implementation that also takes into account the behavior of the task when the task implements Invocable
.
For example, a task that declares itself as non-blocking can be consumed using the Produce-Consume
mode, since there is no risk to stop production because the task will not block.
Conversely, a task that declares itself as blocking will stop production, and therefore must be consumed using either the Produce-Execute-Consume
mode or the Execute-Produce-Consume
mode.
Deciding between these two modes depends on whether there is a free thread immediately available to take over production, and this is captured by the org.eclipse.jetty.util.thread.TryExecutor
interface.
An implementation of TryExecutor
can be asked whether a thread can be immediately and exclusively allocated to run a task, as opposed to a normal Executor
that can only queue the task in the expectation that there will be a thread available in the near future to run the task.
The concept of task consumption modes, coupled with Invocable
tasks that expose their own behavior, coupled with a TryExecutor
that guarantees whether production can be immediately taken over are captured by the default Jetty execution strategy, named org.eclipse.jetty.util.thread.AdaptiveExecutionStrategy
.
|
Thread Pool
Jetty’s threading architecture requires a more sophisticated thread pool than what offered by Java’s java.util.concurrent.ExecutorService
.
Jetty’s default thread pool implementation is QueuedThreadPool
.
QueuedThreadPool
integrates with the Jetty component model, implements Executor
, provides a TryExecutor
implementation (discussed in the adaptive execution strategy section), and supports virtual threads (introduced as a preview feature in Java 19 and Java 20, and as an official feature since Java 21).
Thread Pool Queue
QueuedThreadPool
uses a BlockingQueue
to store tasks that will be executed as soon as a thread is available.
It is common, but too simplistic, to think that an upper bound to the thread pool queue is a good way to limit the number of concurrent HTTP requests.
In case of asynchronous servers like Jetty, applications may have more than one thread handling a single request. Furthermore, the server implementation may produce a number of tasks that must be run by the thread pool, otherwise the server stops working properly.
Therefore, the "one-thread-per-request" model is too simplistic, and the real model that predicts the number of threads that are necessary is too complicated to produce an accurate value.
For example, a sudden large spike of requests arriving to the server may find the thread pool in an idle state where the number of threads is shrunk to the minimum. This will cause many tasks to be queued up, way before an HTTP request is even read from the network. Add to this that there could be I/O failures processing requests, which may be submitted as a new task to the thread pool. Furthermore, multiplexed protocols like HTTP/2 have a much more complex model (due to data flow control). For multiplexed protocols, the implementation must be able to write in order to progress reads (and must be able to read in order to progress writes), possibly causing more tasks to be submitted to the thread pool.
If any of the submitted tasks is rejected because the queue is bounded the server may grind to a halt, because the task must be executed, sometimes necessarily in a different thread.
For these reasons:
The thread pool queue must be unbounded. |
There are better strategies to limit the number of concurrent requests, discussed in this section.
QueuedThreadPool
configuration
QueuedThreadPool
can be configured with a maxThreads
value.
However, some of the Jetty components (such as the selectors) permanently steal threads for their internal use, or rather QueuedThreadPool
leases some threads to these components.
These threads are reported by QueuedThreadPool.leasedThreads
and are not available to run application code.
QueuedThreadPool
can be configured with a reservedThreads
value.
This value represents the maximum number of threads that can be reserved and used by the TryExecutor
implementation.
A negative value for QueuedThreadPool.reservedThreads
means that the actual value will be heuristically derived from the number of CPU cores and QueuedThreadPool.maxThreads
.
A value of zero for QueuedThreadPool.reservedThreads
means that reserved threads are disabled, and therefore the Execute-Produce-Consume
mode is never used — the Produce-Execute-Consume
mode is always used instead.
QueuedThreadPool
always maintains the number of threads between QueuedThreadPool.minThreads
and QueuedThreadPool.maxThreads
; during load spikes the number of thread grows to meet the load demand, and when the load on the system diminishes or the system goes idle, the number of threads shrinks.
Shrinking QueuedThreadPool
is important in particular in containerized environments, where typically you want to return the memory occupied by the threads to the operative system.
The shrinking of the QueuedThreadPool
is controlled by two parameters: QueuedThreadPool.idleTimeout
and QueuedThreadPool.maxEvictCount
.
QueuedThreadPool.idleTimeout
indicates how long a thread should stay around when it is idle, waiting for tasks to execute.
The longer the threads stay around, the more ready they are in case of new load spikes on the system; however, they consume resources: a Java platform thread typically allocates 1 MiB of native memory.
QueuedThreadPool.maxEvictCount
controls how many idle threads are evicted for one QueuedThreadPool.idleTimeout
period.
The larger this value is, the quicker the threads are evicted when the QueuedThreadPool
is idle or has less load, and their resources returned to the operative system; however, large values may result in too much thread thrashing: the QueuedThreadPool
shrinks too fast and must re-create a lot of threads in case of a new load spike on the system.
A good balance between QueuedThreadPool.idleTimeout
and QueuedThreadPool.maxEvictCount
depends on the load profile of your system, and it is often tuned via trial and error.
Virtual Threads
Virtual threads have been introduced in Java 19 and Java 20 as a preview feature, and have become an official feature since Java 21.
In Java versions where virtual threads are a preview feature, remember to add --enable-preview to the JVM command line options to use virtual threads.
|
Virtual Threads Support with QueuedThreadPool
QueuedThreadPool
can be configured to use virtual threads by specifying the virtual threads Executor
:
QueuedThreadPool threadPool = new QueuedThreadPool();
// Simple, unlimited, virtual thread Executor.
threadPool.setVirtualThreadsExecutor(Executors.newVirtualThreadPerTaskExecutor());
// Configurable, bounded, virtual thread executor (preferred).
VirtualThreadPool virtualExecutor = new VirtualThreadPool();
virtualExecutor.setMaxThreads(128);
threadPool.setVirtualThreadsExecutor(virtualExecutor);
// For server-side usage.
Server server = new Server(threadPool);
// Simple client-side usage.
HttpClient client = new HttpClient();
client.setExecutor(threadPool);
// Client-side usage with explicit HttpClientTransport.
ClientConnector clientConnector = new ClientConnector();
clientConnector.setExecutor(threadPool);
HttpClient httpClient = new HttpClient(new HttpClientTransportOverHTTP(clientConnector));
Jetty cannot enforce that the |
AdaptiveExecutionStrategy
makes use of this setting when it determines that a task should be run with the Produce-Execute-Consume
mode: rather than submitting the task to QueuedThreadPool
to be run in a platform thread, it submits the task to the virtual threads Executor
.
Enabling virtual threads in Defaulting the number of reserved threads to zero ensures that the Produce-Execute-Consume mode is always used, which means that virtual threads will always be used for blocking tasks. |
Virtual Threads Support with VirtualThreadPool
VirtualThreadPool
is an alternative to QueuedThreadPool
that creates only virtual threads (no platform threads).
VirtualThreadPool threadPool = new VirtualThreadPool();
// Limit the max number of current virtual threads.
threadPool.setMaxThreads(200);
// Track, with details, virtual threads usage.
threadPool.setTracking(true);
threadPool.setDetailedDump(true);
// For server-side usage.
Server server = new Server(threadPool);
// Simple client-side usage.
HttpClient client = new HttpClient();
client.setExecutor(threadPool);
// Client-side usage with explicit HttpClientTransport.
ClientConnector clientConnector = new ClientConnector();
clientConnector.setExecutor(threadPool);
HttpClient httpClient = new HttpClient(new HttpClientTransportOverHTTP(clientConnector));
Despite the name, VirtualThreadPool
does not pool virtual threads, but allows you to impose a limit on the maximum number of current virtual threads, using a Semaphore
.
Limiting the number of current virtual threads helps to limit resource usage in applications, especially in case of load spikes. When an unlimited number of virtual threads is allowed, the server might be brought down due to resource (typically memory) exhaustion.
Furthermore, you can configure it to track virtual threads so that a Jetty component dump will show all virtual threads currently in use, including those that are unmounted.
Virtual Threads Pinning
Even when using virtual threads, Jetty uses non-blocking I/O, and dedicates a thread to each java.nio.channels.Selector
to perform the Selector.select()
operation.
Currently (up to Java 22), calling Selector.select()
from a virtual thread pins the carrier thread.
If you configure a server-side Connector
, or Jetty’s HttpClient
with N
selectors, then N
carrier threads will be pinned by the virtual threads calling Selector.select()
.
If you have less than N
CPU cores in your system, then by default all carriers will be pinned in the Selector.select()
call, leaving no carrier to execute virtual threads, and therefore completely locking up your system, which will become completely unresponsive.
If you have more than N
CPU cores in your system, then by default your system may be less efficient, since the carrier threads may be pinned in the Selector.select()
call, and therefore not available to run virtual threads.
The number of CPU cores of your system determines, by default, the number of carrier threads.
The number of carrier threads may be explicitly configured by setting the system property Selector threads used by Jetty pin carrier threads. Choose the number of selectors wisely when using virtual threads: the number of selectors must always be less than the number of carrier threads, to leave some of the carrier threads free to run virtual threads. As an extreme example, if your system only has |