Modern Java for Rust Engineers
Module 8 of 8 Advanced 45 min

Virtual Threads and Structured Concurrency

Prerequisites: functional-idioms-streams-lambdas-optional

This module is the capstone of the plan. It brings together GC awareness (Module 02), error handling (Module 06), and functional idioms (Module 07) to show how Java achieves high concurrency without async/await. By the end, you will see exactly where the Java model diverges from Rust's tokio model — and why that divergence is an intentional design choice, not an oversight.

What You'll Learn

Why This Matters

High-concurrency I/O has been Java's most awkward topic for two decades. Before Java 21, writing a service that handled thousands of concurrent I/O operations meant choosing between painful options: a thread pool with a limited cap (the ExecutorService model), callback-driven CompletableFuture chains that are hard to read and even harder to debug, or a full reactive framework like Project Reactor or RxJava that requires rewriting your domain logic around streams of events. None of these felt natural.

Virtual threads change the equation. You write ordinary, blocking, synchronous Java — the same style you have been writing throughout this plan — and the JVM makes it concurrent automatically. For a Rust engineer evaluating Java, this is one of the most practically significant features of modern Java. It is not as memory-efficient as tokio for extreme workloads, but for the vast majority of server-side services, it is simpler to write, simpler to test, and simpler to debug than Rust async code. Knowing when to reach for it — and when not to — is the skill this module builds.

Core Concept

If you are coming from Rust, your mental model for high-concurrency I/O is probably Rust's async/await + tokio. You define async fns, .await on futures, and a tokio runtime schedules the resulting state machines on a thread pool. The key property of this model is function coloring: an async function is a different kind of thing from a sync function. You cannot call an async function from a sync context without a runtime; you cannot call sync blocking code from async code without blocking the thread (which stalls the tokio scheduler). Managing the boundary between colored and uncolored code is a constant source of friction.

Java's virtual thread model makes a different trade-off. There is no coloring. You write blocking code everywhere, exactly as you would in a simple single-threaded program. The JVM intercepts blocking operations — I/O calls, lock acquisitions, Thread.sleep() — and suspends the virtual thread at those points without blocking the underlying operating system (OS) thread.

Here is the mechanism. When a virtual thread blocks, it unmounts from its carrier thread (the platform thread it was executing on). The carrier thread is then free to pick up another virtual thread and execute it. When the blocking operation completes, the virtual thread is scheduled back onto a carrier thread and resumes. From your code's perspective, the thread just blocked and then continued. From the JVM's perspective, the carrier was never idle.

A platform thread is a traditional Java thread backed by one OS thread. Platform threads are expensive: each one allocates a 512 KB to 1 MB stack in OS-managed memory, and the OS scheduler decides when to run them. You cannot create more than a few thousand before running out of resources.

A virtual thread is a JVM object on the GC heap. Its stack is stored on the heap as a stack chunk object and grows or shrinks as needed, starting at only a few hundred bytes. You can create millions of virtual threads in the same JVM process that could only accommodate a few thousand platform threads.

The carrier threads form a small pool, by default sized to the number of CPU cores. This pool is the only OS-level parallelism involved. Virtual threads are multiplexed across this pool cooperatively: a virtual thread runs until it blocks, then unmounts, freeing the carrier.

Rust comparison: Tokio uses a similar architecture: a small pool of OS threads (workers) runs many lightweight tasks (futures). The difference is the programming model. Rust tasks must be async fns that explicitly .await to yield; Java virtual threads yield implicitly at any blocking call. Java's model requires no code changes to existing blocking libraries. Rust's model has lower per-task memory overhead and enables the compiler to reason about async safety, but imposes function coloring.

StructuredTaskScope: Concurrent Tasks with Structured Lifetimes

Creating virtual threads directly with Thread.ofVirtual().start(runnable) is possible, but it gives you no coordination — you have to join threads manually, handle exceptions manually, and cancel stray threads manually. This is exactly the problem that StructuredTaskScope solves.

StructuredTaskScope is Java's structured concurrency scope (a structured concurrency scope — not to be confused with the JPMS module visibility scope from Module 01). It enforces one rule: all forked child tasks must complete before the scope exits. This rule is enforced by the try-with-resources block: when the try block ends, the scope's close() method joins all outstanding subtasks and then releases resources.

There are two built-in policies:

ShutdownOnFailure: if any forked task throws an exception, the scope sends a shutdown signal to all remaining tasks and then rethrows the exception to the parent. Use this when all tasks must succeed for the result to be valid.

ShutdownOnSuccess: the first task to return a result wins; the scope cancels the remaining tasks. Use this for "race" patterns where you want the fastest of several equivalent approaches.

Java 21+ only: StructuredTaskScope requires Java 21 or later and is still a preview API in many patch releases. To enable it in Gradle, ensure your toolchain targets Java 21 and compile with --enable-preview:

// build.gradle.kts
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(21)
    }
}
tasks.withType<JavaCompile> {
    options.compilerArgs.addAll(listOf("--enable-preview", "--release", "21"))
}
tasks.withType<JavaExec> {
    jvmArgs("--enable-preview")
}

Pinning: The Main Gotcha

A virtual thread is "pinned" when it cannot unmount from its carrier thread during a blocking operation. Pinning does not make your program incorrect — it just means that a carrier thread is blocked while the virtual thread waits, which defeats the purpose of virtual threads for that duration.

Pinning happens in two situations:

  1. synchronized blocks or methods: the JVM's current monitor implementation ties a virtual thread to its carrier when it holds a synchronized monitor. If the virtual thread then blocks inside that synchronized block (e.g., makes an I/O call), the carrier blocks too.

  2. Native frames: if the call stack contains a native method (via JNI), the virtual thread cannot unmount.

The fix for synchronized pinning is to replace synchronized blocks with java.util.concurrent.locks.ReentrantLock. ReentrantLock is a cooperative lock that allows virtual threads to unmount while waiting:

// Causes pinning: virtual thread holds monitor and blocks
synchronized (lock) {
    String data = httpClient.get(url); // blocks here, pins carrier
    process(data);
}

// Does not cause pinning: virtual thread can unmount while awaiting lock
private final ReentrantLock lock = new ReentrantLock();

lock.lock();
try {
    String data = httpClient.get(url); // virtual thread unmounts here
    process(data);
} finally {
    lock.unlock();
}

Note: Java 24 (JEP 491) resolves the synchronized pinning limitation, making virtual threads able to unmount even inside synchronized blocks. If you are on Java 24+, synchronized is no longer a pinning risk. For Java 21–23, prefer ReentrantLock for any lock held across I/O.

Concrete Example

The running example for this module is the payment processor's processPayment method. To process a payment, you need two pieces of external data: the exchange rate for the payment's currency (fetched from an exchange-rate service), and a validation that the payer's account has sufficient balance (fetched from an account service). These two operations are independent, so you can run them in parallel.

Here is the sequential baseline — correct, but slow:

// Java — sequential (slow)
public PaymentResult processPaymentSequential(Payment payment) {
    try {
        double rate = fetchExchangeRate(payment.currency());   // 100ms
        boolean valid = validateAccountBalance(               // 80ms
            payment.fromAccount(), payment.amount());         // Total: ~180ms

        if (!valid) {
            return new PaymentFailure("Insufficient funds", ErrorCode.INSUFFICIENT_FUNDS);
        }
        String txId = dispatchPayment(payment, rate);
        return new PaymentSuccess(txId, payment.amount() * rate);

    } catch (IOException e) {
        return new PaymentFailure("Network error: " + e.getMessage(), ErrorCode.NETWORK_TIMEOUT);
    }
}

Now with StructuredTaskScope.ShutdownOnFailure to run both I/O calls in parallel:

// Java — parallel with StructuredTaskScope (fast)
import java.util.concurrent.StructuredTaskScope;

public PaymentResult processPayment(Payment payment) {
    try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {

        // Fork both I/O tasks concurrently
        var rateTask = scope.fork(() -> fetchExchangeRate(payment.currency()));
        var validTask = scope.fork(() ->
            validateAccountBalance(payment.fromAccount(), payment.amount()));

        // Wait for both tasks to complete (or either to fail)
        scope.join().throwIfFailed();

        // At this point, both tasks succeeded
        double rate = rateTask.resultNow();
        boolean valid = validTask.resultNow();

        if (!valid) {
            return new PaymentFailure("Insufficient funds", ErrorCode.INSUFFICIENT_FUNDS);
        }
        String txId = dispatchPayment(payment, rate);
        return new PaymentSuccess(txId, payment.amount() * rate);

    } catch (InterruptedException | IOException e) {
        return new PaymentFailure("Network error: " + e.getMessage(), ErrorCode.NETWORK_TIMEOUT);
    } catch (Exception e) {
        return new PaymentFailure("Unexpected error", ErrorCode.NETWORK_TIMEOUT);
    }
}

The two I/O operations now run concurrently. If fetchExchangeRate takes 100ms and validateAccountBalance takes 80ms, the total wait is ~100ms, not ~180ms. If either task throws, scope.join().throwIfFailed() rethrows the exception, and the scope's close() method cancels the other task automatically.

Notice what you did not have to write: no ExecutorService, no Future.get(), no CompletableFuture.allOf(), no explicit cancellation. The structured concurrency scope handles all of it.

Common pitfall: Calling scope.join() is mandatory before calling resultNow() on any subtask. resultNow() will throw IllegalStateException if the task has not completed. The scope does not implicitly join at close() without an explicit join() call from you. The pattern is always: fork → join → read results.

Analogy

Structured concurrency is like a team huddle with a strict rule: the meeting does not end until everyone has reported in. You (the parent thread) call the huddle (open the scope), send team members off to gather information (fork tasks), and then wait at the door (join). Nobody leaves until everyone is back. If a team member calls in sick (a task fails), the meeting is cancelled and everyone returns immediately (shutdown on failure). If you just need any one person to confirm a fact (shutdown on success), the first person back dismisses the rest.

This is why structured concurrency is named after structured programming. Just as structured programming replaced goto with loops and functions that have clear entry and exit points, structured concurrency replaces raw thread spawning with scoped blocks that have clear lifetimes.

Going Deeper

Virtual Thread Memory and the GC

Virtual thread stacks live on the GC heap. A virtual thread that is blocked waiting for I/O holds a stack chunk object in memory. If you create a million virtual threads all waiting on I/O simultaneously, you have a million stack chunk objects on the heap. Each is small (typically a few hundred bytes to a few kilobytes depending on call depth), but a million of them is still meaningful.

Connecting back to Module 02: these stack objects are short-lived from the GC's perspective. A virtual thread that handles a single HTTP request is created, runs, blocks briefly for I/O, resumes, completes, and becomes garbage — all within the duration of that request. The generational GC will collect them efficiently in the young generation. Very long-lived virtual threads that accumulate deep call stacks will eventually promote to the old generation, contributing to major GC pressure. For most request-handling patterns, this is not a problem.

ShutdownOnSuccess for Racing

ShutdownOnSuccess inverts the logic: instead of requiring all tasks to succeed, you want the fastest result from any of them:

// Java — race pattern: first successful exchange rate wins
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<Double>()) {
    scope.fork(() -> fetchExchangeRateFromProvider1(currency));
    scope.fork(() -> fetchExchangeRateFromProvider2(currency));
    scope.fork(() -> fetchExchangeRateFromProvider3(currency));

    scope.join();
    double rate = scope.result(); // result from whichever provider responded first
    // The other two tasks are cancelled
}

This is useful for hedged requests: send the same query to multiple redundant backends and take the fastest response.

Why Not Parallel Streams?

Module 07 introduced parallel streams (via .parallelStream()). Parallel streams use the fork-join common pool and are designed for CPU-bound data parallelism — splitting a large collection into chunks and processing them simultaneously across CPU cores. Using parallel streams for I/O-bound work is counterproductive: threads block waiting for I/O, occupying fork-join workers and starving other tasks.

Virtual threads are for I/O-bound concurrency. Parallel streams are for CPU-bound data parallelism. Use each for its intended purpose; do not substitute one for the other.

Scoped Values (A Glimpse)

Java 21 also introduced scoped values (JEP 446, preview), a way to pass context data down through a call stack to virtual threads without using thread-local variables. Thread-locals work with virtual threads but have known performance and correctness issues at scale (they are not structured — a thread-local value persists until explicitly removed, regardless of scope). Scoped values are immutable and automatically confined to the structured concurrency scope. They are beyond this module's scope, but worth knowing they exist.

Common Misconceptions

1. "Virtual threads replace async/await — Java now has coroutines."

Virtual threads and async/await solve the same problem but via different mechanisms. Async/await transforms functions into state machines (futures) that are polled by a scheduler. Virtual threads preserve a real call stack per concurrent task and suspend at blocking operations. The key difference: virtual threads have higher per-task memory usage than a pure future (because they carry a stack chunk), but they require no code changes — any library that blocks will automatically become non-blocking with virtual threads. Async/await in Rust requires every layer of the call stack to be async. Call them different solutions to the same problem, not the same thing.

2. "Virtual threads are free — I can create unlimited millions with zero overhead."

Virtual threads are cheap but not free. Each virtual thread has a small but nonzero stack on the heap. If all your virtual threads are actively computing (CPU-bound), they still compete for the carrier thread pool. Creating a million virtual threads for CPU-bound tasks is worse than creating CPU-count platform threads for CPU-bound tasks, because the scheduler overhead adds up and there is no I/O to unmount on. Virtual threads are designed specifically for I/O-bound workloads where the thread spends most of its time blocked.

3. "Structured concurrency means StructuredTaskScope — those are synonyms."

StructuredTaskScope is the API that enforces structured concurrency in Java 21+. Structured concurrency is a broader principle that applies in any language: concurrent tasks should have a clearly defined lifetime, rooted in their spawning scope. The principle predates StructuredTaskScope; Kotlin's coroutine scopes and Python's asyncio.TaskGroup implement the same principle differently. Future Java APIs will extend this pattern beyond StructuredTaskScope.

Check Your Understanding

  1. You have a method that calls three external services: one takes 200ms, one takes 150ms, and one takes 100ms. You use StructuredTaskScope.ShutdownOnFailure to fork all three. How long does the whole operation take if all succeed? What happens to the total time if the 150ms service fails after 50ms?

    Answer: If all succeed, the scope waits for the slowest task: ~200ms total. If the 150ms service fails after 50ms, ShutdownOnFailure cancels the 200ms and 100ms tasks and propagates the exception immediately. The whole scope completes at ~50ms (the point of failure), not 200ms.

  2. Why does synchronized (lock) { httpClient.get(url); } cause pinning in Java 21, and how do you fix it?

    Answer: In Java 21, the JVM's monitor implementation pins a virtual thread to its carrier thread while it holds a synchronized monitor. When httpClient.get(url) blocks, the carrier thread cannot pick up another virtual thread — it is stuck waiting. The fix is to replace synchronized with a ReentrantLock. ReentrantLock is built on java.util.concurrent primitives that support virtual thread unmounting during waits. (Note: Java 24's JEP 491 resolves this for synchronized itself.)

  3. You want to use virtual threads for a data processing pipeline that applies heavy mathematical transformations to a million payment records. Is this a good use of virtual threads? What would you use instead?

    Answer: No. Virtual threads are for I/O-bound concurrency. CPU-bound computation keeps the carrier thread occupied the entire time, so multiple virtual threads running computation compete for the same small pool of carriers. A better approach: use a thread pool sized to the CPU core count (Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors())), or use parallel streams (payments.parallelStream().map(this::transform)) which are specifically designed for CPU-bound data parallelism (Module 07).

  4. In Rust, you can write async fn fetch() -> Result<String, Error> and call it from another async function with .await. What is the equivalent structure in Java with virtual threads, and what happens at a blocking operation that in Rust would require .await?

    Answer: In Java, the method is just a normal blocking method: String fetch() throws IOException { return httpClient.get(url); }. No async annotation, no .await. When httpClient.get() blocks waiting for the network, the JVM automatically unmounts the virtual thread from its carrier thread. The call stack is preserved on the heap. When the response arrives, the virtual thread is rescheduled and execution resumes at exactly the same line, as if it had just been a slow function call. The programmer sees a blocking call; the JVM executes it concurrently.

  5. Why must you call scope.join() before calling rateTask.resultNow()?

    Answer: resultNow() retrieves the result of a completed task. If the task has not finished yet, calling resultNow() throws IllegalStateException — there is no result to return. scope.join() blocks the current thread (the parent virtual thread) until all forked tasks have completed. Only after join() is it safe to call resultNow() on any subtask. Skipping join() is a race condition: in a fast execution the task might have already completed, but in a slow execution it has not.

Key Takeaways


Synthesis: Eight Pillars of Modern Java Through a Rust Lens

You have now seen the eight pillars of modern Java as a Rust engineer. The mental model shifts, in order:

What would a Rust engineer reach for first in a new Java project? Gradle (familiar dependency model), records and sealed types (familiar shape, different guarantees), virtual threads (different model than tokio, but simpler for most workloads), and streams (familiar, but watch for type erasure and boxing trade-offs).

References