ZB Field Notes

Spring Batch 6, Hands-On: A New Engine, a Real Operator, and Stops That Cross JVMs

The setup

I spent a session putting Spring Batch 6 through its paces on Spring Boot 4.1 and Java 25, with job metadata living in Postgres (started by Docker Compose). Spring Batch 6 rides on Spring Framework 7, and the changes are not cosmetic — the execution engine, the fault-tolerance model, the command-line tooling, and the stop/recover semantics are all genuinely new. This is a code-forward tour of the parts that change how you actually build and operate jobs, with the BATCH_ metadata as evidence.

One framing rule first: in a Boot app you do not add @EnableBatchProcessing. Doing so switches Boot's batch auto-configuration off. You write plain @Configuration beans — reader, processor, writer, step, job — and Boot wires the repository and the runner. The whole "less ceremony" story depends on letting auto-config do its job.

§3 — Infrastructure you opt into

The headline default in SB6 is a resourceless JobRepository: out of the box, batch metadata is held in memory, not in a database. No in-memory H2, no schema, nothing persisted. You can see it instantly — a job with a RunIdIncrementer never advances past run.id=1 across restarts, and no BATCH_ tables exist. The execution history simply isn't remembered.

Boot 4 also modularized batch. The plain spring-boot-starter-batch gives you the resourceless core. JDBC persistence is a separate module:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch-jdbc</artifactId>
</dependency>

Add the JDBC store module, set spring.batch.jdbc.initialize-schema=always, and the picture flips. The six BATCH_ tables get created, history persists, and the incrementer advances on every launch:

SELECT job_execution_id, status FROM batch_job_execution;
 job_execution_id |  status
------------------+-----------
                6 | COMPLETED
                7 | COMPLETED
                8 | COMPLETED
                9 | COMPLETED

Worth internalizing: persisting your business data and persisting batch metadata are now two independent switches. The same job code is amnesiac or restartable depending purely on whether that module is on the classpath.

SB6 also consolidates the API surface. JobRepository now extends JobExplorer, so a single injected bean both writes and reads history — no separate explorer bean:

JobInstance last = jobRepository.getLastJobInstance("personJob");
JobExecution exec = jobRepository.getLastJobExecution(last);
// JobRepository IS a JobExplorer now

Likewise JobOperator extends JobLauncher, and JobRegistry is optional with automatic registration. One nice discovery: MapJobRegistry is self-populating — it implements SmartInitializingSingleton and ApplicationContextAware, so simply declaring it registers every Job bean for you:

@Bean
public JobRegistry jobRegistry() {
    return new MapJobRegistry(); // auto-registers all Job beans
}

§4 — A new chunk engine

The chunk-oriented step has a new, stable implementation: ChunkOrientedStep, built with ChunkOrientedStepBuilder. It replaces the legacy TaskletStep machinery. Chunk size moves into the constructor; the transaction manager is its own call:

return new ChunkOrientedStepBuilder<Person, Person>("personStep", jobRepository, 3)
        .reader(personReader)
        .processor(personProcessor)
        .writer(personWriter)
        .transactionManager(transactionManager)
        .faultTolerant()
        .retryPolicy(retryPolicy)
        .skipPolicy(skipPolicy)
        .build();

Fault tolerance is the bigger change. Retry is no longer the external spring-retry library — it's Spring Framework 7's org.springframework.core.retry.RetryPolicy, a spring-core type that batch simply consumes:

RetryPolicy retryPolicy = RetryPolicy.builder()
        .maxRetries(3)
        .includes(TransientProcessingException.class)
        .build();

SkipPolicy skipPolicy = new LimitCheckingExceptionHierarchySkipPolicy(
        Set.of(FlatFileParseException.class), 5);

To prove both paths, I fed the job a transient failure (an item that throws once, then succeeds) and a malformed CSV line. Retry and skip are different strategies — "try the same item again" versus "drop this permanently bad item" — and the step metadata records exactly which path each took:

SELECT status, read_count, write_count, read_skip_count, process_skip_count
FROM batch_step_execution ORDER BY step_execution_id DESC LIMIT 1;
  status   | read_count | write_count | read_skip_count | process_skip_count
-----------+------------+-------------+-----------------+--------------------
 COMPLETED |         39 |          39 |               1 |                  0

The malformed line shows up as read_skip_count=1; the transient item retried to success so it never became a skip. Bad input degrades gracefully instead of sinking the whole job, and every decision is auditable.

§5 — A producer/consumer concurrency model

Making the step multi-threaded is one call:

.taskExecutor(batchTaskExecutor) // a pool of worker threads

But the model underneath is new. The old "parallel iteration" had every worker thread calling the reader, which forced you to wrap stateful readers like FlatFileItemReader in a synchronizer or risk corruption. SB6 uses a single producer that reads items into a bounded queue, and multiple consumer threads that process and write. The reader is touched by exactly one thread, so no synchronization wrapper is needed. The bounded queue is also where backpressure lives — if consumers fall behind, the producer blocks rather than slurping the whole file into memory.

With a four-thread pool over 40 rows, the workers share the load and items complete out of order — real parallelism, visible in the logs:

[worker-2] processing user02
[worker-1] processing user01
[worker-3] processing user06   # user06 before user04 — parallel
[worker-4] processing user04
# worker-1: 10 items, worker-2: 11, worker-3: 11, worker-4: 7

One operational note for run-once batch CLIs: a warm thread pool keeps non-daemon threads alive after the job ends, which can stop the JVM from exiting. Mark the pool's threads daemon (or use an executor whose threads die with their task) so the process shuts down cleanly.

§6 — A command-line operator worth using

SB6 ships CommandLineJobOperator, the modern replacement for CommandLineJobRunner and a clean fit for Boot. It wraps the auto-configured JobOperator, the JobRepository, and a JobRegistry, and exposes the full lifecycle as exit-code-returning operations:

int start(String jobName, Properties params);
int startNextInstance(String jobName);
int restart(long executionId);
int stop(long executionId);
int recover(long executionId);

Turn off Boot's auto-launch with spring.batch.job.enabled=false and drive the job yourself from CLI arguments. Two nuances are worth flagging. First, startNextInstance is not "rerun with identical parameters" — it copies the last instance's parameters forward and advances the incrementer, producing a brand-new instance:

// RunIdIncrementer.getNext(previous):
new JobParametersBuilder(parameters)            // copy ALL previous params
    .addJobParameter(new JobParameter<>("run.id", id, ...)) // run.id = old + 1
    .toJobParameters();

Second, if a job defines an incrementer, start(name, props) logs "Additional parameters will be ignored" and launches via the incrementer. An incrementer and ad-hoc identifying parameters pull in different directions — pick one model per job: incrementing reruns (startNext) or parameter-driven instances (start with params, no incrementer).

§7 — Recovering crashed executions

Here is the scenario every batch operator dreads. A JVM is killed mid-run — power loss, OOM, kill -9. Spring Batch marks an execution STARTED before the work and FAILED/COMPLETED after. A normal exception flows through the "after" path. A hard kill skips it entirely, freezing the row at STARTED forever:

 job_execution_id | status  | exit_code | end_time
------------------+---------+-----------+----------
               10 | STARTED | UNKNOWN   |  (null)

That zombie blocks restart — as far as the metadata knows, it's still running, so restart rightly refuses it. Pre-SB6 you fixed this with manual UPDATE BATCH_JOB_EXECUTION SET status='FAILED' surgery. SB6 adds JobOperator#recover(), which reconciles the zombie consistently across any store:

# recover 10  ->  STARTED becomes FAILED, end_time set
# restart 10  ->  "Resuming job execution: id=10 ... status=FAILED"  ->  new execution COMPLETED

After recover then restart, the instance shows the original execution FAILED and a fresh one COMPLETED. One honest caveat the crash makes unavoidable: batch guarantees the work completes, not that it runs exactly once. The pre-crash chunk had committed three rows; the restart reprocessed everything. If duplicates matter, make the writer idempotent (an upsert).

§8 / §9 — A stop that crosses JVMs

The most surprising result of the session: you can stop a job running in one JVM from a different JVM — and there is no signal, socket, or RPC involved. The shared JobRepository is the message bus.

The stopping process does almost nothing. It loads the execution, sets the status, and writes one row:

jobExecution.setStatus(BatchStatus.STOPPING);
jobExecution.setEndTime(LocalDateTime.now());
jobRepository.update(jobExecution); // -> UPDATE BATCH_JOB_EXECUTION SET status='STOPPING'

That UPDATE is the entire "signal." The running job picks it up because, after committing each chunk, the step calls jobRepository.update(stepExecution) — and that call re-reads the persisted job status and flips an in-memory flag:

// SimpleJobRepository.update(StepExecution)
this.jobExecutionDao.synchronizeStatus(jobExecution); // re-read status FROM the DB
if (jobExecution.getStatus() == BatchStatus.STOPPING) {
    stepExecution.setTerminateOnly();                 // flip in-memory flag
}

The chunk loop checks that flag at each boundary and bails out:

while (chunkTracker.moreItems() && !interrupted(stepExecution)) { ... }
// interrupted() is true once isTerminateOnly() -> throws JobInterruptedException -> STOPPED

So the flow is: push a status to the database, the worker pulls it on its next chunk, the worker stops itself. The result is a clean halt on a chunk boundary — in my run the step stopped at exactly write_count=18 with commit_count=6 (six full chunks of three), status STOPPED, fully restartable. A later restart resumed the same instance to COMPLETED.

Two things fall out of this design for free. It's cluster-safe: the stopper can be any process pointed at the same repository — another CLI, a REST endpoint, a worker on a different machine — with no service discovery. And the stop is only honored at chunk boundaries, because that's the only moment the worker re-reads status; granularity follows your chunk size. Contrast with the kill -9 from §7, which froze at STARTED precisely because it never reached a boundary to notice anything. Stop is cooperative; the worker has to be alive and looping to honor it.

Taking stock

The throughline across all of these is the same: Spring Batch 6 realigned onto the Spring Framework 7 platform and absorbed responsibilities it used to outsource. Retry came in-house from spring-retry. JobRepository swallowed JobExplorer; JobOperator swallowed JobLauncher. The chunk engine, the concurrency model, and the stop/recover semantics were rebuilt rather than patched. The defaults ask you to opt into persistence and parallelism, which makes the framework smaller and the behavior more explicit.

For anyone running batch in anger, the operational trio is the real prize: recover turns a manual UPDATE into one call, stop is database-driven and cluster-safe, and the metadata tables tell you precisely what happened. I still have JFR-based observability and the scaling options (local chunking, remote steps, SEDA channels) to explore — but even at this point, SB6 feels like a sharper tool for the kind of regulated, has-to-be-correct batch I build.