Ask Zakaria: I bolted a grounded AI chat onto my CV site
Why put a chatbot on a CV?
A recruiter lands on my site, skims for thirty seconds, and leaves with whatever their eyes happened to catch. A CV is a one-way broadcast — they can't ask it "how deep is his Kafka experience, really?" or "has he shipped anything in regulated banking?" So I gave the site a mouth: Ask Zakaria, a small AI concierge that answers questions about my career, grounded in my own dossier. This is the engineering behind it — the decisions, the one genuinely nasty race, and the bug that had nothing to do with code.
Grounded, not free-range
The failure mode I cared about most wasn't downtime — it was a confident hallucination about my experience in front of a hiring manager. So the model never free-associates. Every answer is grounded in two things: a dossier.md file on the classpath (the canonical version of my career, which I keep in sync by hand), plus the five latest published posts from this very blog. The prompt is assembled from those, the question rides on top, and the model is told, in so many words, to answer from the dossier or admit it doesn't know.
It runs on DeepSeek's deepseek-chat via Spring AI, at a deliberately middle-of-the-road temperature — factual about the facts, but not robotic:
DeepSeekChatOptions options = DeepSeekChatOptions.builder()
.model(concierge.getModel())
.maxTokens(concierge.getMaxAnswerTokens())
// Middle ground: factual about the dossier, but not robotic.
.temperature(0.5)
.build();
The bean that might not exist
I have a rule for this app: it must boot cleanly with zero secrets. No LinkedIn keys, no DeepSeek key — it still starts, just with those features quietly reporting themselves as disabled. That keeps local dev frictionless and means a missing key is a degraded feature, never a crash loop.
So I skipped the Spring AI autoconfigure starter and hand-wired the chat model behind a condition. No key, no bean — and nothing ever tries to build a DeepSeekApi with a null key:
@Bean
@Conditional(DeepSeekApiKeyPresent.class)
ChatModel conciergeChatModel(CvNextProperties properties) { ... }
static class DeepSeekApiKeyPresent implements Condition {
@Override
public boolean matches(ConditionContext ctx, AnnotatedTypeMetadata md) {
return StringUtils.hasText(ctx.getEnvironment().getProperty("cvnext.concierge.api-key"));
}
}
The service that uses it takes the model as an ObjectProvider<ChatModel> rather than a hard dependency, and checks getIfAvailable() before doing anything. Same graceful-degradation pattern I already use for LinkedIn OAuth and the optional mail sender — once you've made one optional dependency behave, you copy the shape everywhere.
Streaming without hogging a thread
Answers stream token-by-token over Server-Sent Events, so the recruiter sees words appear instead of staring at a spinner for eight seconds. The important part is what happens on the server: the servlet thread hands the work to a reactive Flux and returns immediately, and chunks are forwarded to the SseEmitter on the HTTP client's own threads. A slow model tying up a request thread is exactly the kind of thing that looks fine in a demo and falls over under load — so I designed it out from the start.
A quota that counts the right things
The model costs money per token, so anonymous-but-signed-in visitors get five questions each, keyed by their OAuth email (I'm exempt as the owner). Two subtleties made this more interesting than a counter++.
Failed answers are free. Only successfully-completed answers count against your five. A question that dies mid-stream is persisted as an ERROR row, which the quota simply ignores — you shouldn't lose a question because DeepSeek hiccuped.
The race I didn't see coming. The database row for an answer only lands when its stream finishes. So a user with one question left could fire ten parallel requests inside the model's latency window — every one of them reads "one slot left" and proceeds. The fix is to reserve the slot in memory before buying any tokens, atomically per user:
private void reserveSlot(String email) {
inFlight.compute(email, (key, active) -> {
int streaming = active == null ? 0 : active;
if (remainingFor(key) - streaming <= 0) {
throw new ConciergeRejectedException(HttpStatus.TOO_MANY_REQUESTS,
"You have used all your questions — feel free to reach out to Zakaria directly.");
}
return streaming + 1;
});
}
The remaining count is persisted OK rows minus in-flight asks, and the slot is released exactly once when the stream terminates. Concurrency bugs love the gap between "I checked" and "I acted" — compute closes it.
The best bug wasn't in the code
I shipped all of this, and then almost nobody used it. The reason was embarrassing and instructive: the chat widget only rendered after you signed in. But signing in was optional, and the whole point of the feature was to be a reason to sign in. My strongest incentive was hidden behind the exact wall it was supposed to pull people through.
The endpoint is auth-gated, so a logged-out visitor's status probe just 401s and the widget stays invisible — correct, and completely self-defeating. So I added a greyed-out twin of the launcher for logged-out visitors: same shape and position, a "Sign in to use" label, and a small sparkling NEW badge with an arrow bobbing toward it. Sign in and the button doesn't jump — it just comes alive. I designed that little attention-grabber in Claude Design first, then mapped it onto the site's real brand tokens so it matches the rest of the page exactly.
What I'd tell past me
The AI part was the easy 20%. The interesting 80% was everything around it: booting without the key, not hogging threads, counting quota correctly under concurrency, and — most of all — remembering that a feature nobody can find is a feature that doesn't exist. It's live on the site now. If you're reading this on my blog, the concierge is one click away in the corner. Go ahead and interrogate my career.