Local LLM in network monitoring: Ollama + MCP + Spring Boot in three days
How I plugged a local LLM into an OLT monitoring system through MCP. Architecture, an in-process dispatcher instead of HTTP, 16 tools, SSE to the browser.
GetOLT, the PON-network monitoring service I'm developing (origin story), tracks 350+ OLTs and 100K+ ONUs. When the tech team needs to answer "how many ONUs in Almaty have a bad signal on CDATA?", nobody writes SQL. Until this week — 5 clicks through filters; after — one question in a chat.
The bot answers in natural language and hits the real database, not a cache. From first commit to a run on three live OLTs — three days.
Why a local LLM
First question — why not GPT/Claude through an API. The answer is product-driven, not technical:
- Customer data — billing IDs, equipment IPs, contract logins — must not leave for third-party clouds
- Russian 152-FZ data law: user consent for processing personal data is in place; cross-border transfer consent isn't, and getting it for the convenience of a bot is pointless
- gpt-oss:20b on Apple Silicon does 30-40 tokens/sec — acceptable for an interactive chat
- Zero rubles per request, the limit is hardware muscle, not budget
If I were building an assistant for an open-source tool, the choice would be different. For a commercial product in B2B telecom, a local LLM is the only honest path.
Architecture: three boxes
[Browser] ←SSE— [LlmChatController] → [LlmChatService]
│ tool-loop
▼
┌──────────────┬──────────────────┐
[Ollama] [InProcessMcpDispatcher]
gpt-oss:20b │
▼
[16 MCP tools]
│
▼
[OltService, OnuService, ...]
│
▼
[MySQL]
The key decision — in-process MCP dispatcher. Standard MCP assumes a separate process with HTTP or stdio transport. In my setup the LLM loop and the MCP server live inside the same Spring Boot, so I threw out serialization. A tool call is just a Java method by name with JSON arguments, no HTTP bridge.
Downside: you can't reuse the MCP server from other clients (Claude Desktop, for example). Upside: less code, less latency, single Security context and single audit trail. For an embedded assistant inside a product, that's the right tradeoff.
LlmChatService.run is a plain tool loop:
while (turn < MAX_TURNS) {
var response = ollama.chat(messages, toolSchemas);
if (response.toolCalls().isEmpty()) {
emit(response.content());
break;
}
for (var call : response.toolCalls()) {
var result = mcpDispatcher.invoke(call.name(), call.args());
messages.add(toolResultMessage(call.id(), result));
emit(new ToolEvent(call, result));
}
turn++;
}
SSE streaming sends the user not only the final text but every tool call with arguments and result. You can see how the bot builds its answer — which filter it applied, what the DB returned. That's both a UX feature and a debugging tool.
What I got end-to-end
- 16 MCP tools on top of existing services:
olt.list,onu.stats_by_olt,olt.distinct_values,data.export,feedback.send, and others. Each one narrow, with an explicit JSON schema. - Full append-only audit:
chat_session+chat_message+mcp_chat_log. Any conversation can be reconstructed byte-for-byte. - Ephemeral file attachments: the model can dump an aggregate to CSV/JSON, the file lives 5 minutes, the link is one-shot.
- Read-only SQL escape hatch: if a task isn't covered by tools, the model can ask for SQL — but through a separate tool with SELECT-only validation. This cuts hallucinations and gives a zone of gradual coverage.
A run in production, three live OLTs, real data:
- "show ONUs with signal worse than -27 dBm on CDATA in Almaty" — answers in 4-6 seconds
- "how many ports are over 75% full per operator?" — computes the aggregate, draws a table
- "export the list of problem ONUs for the week" — generates CSV, hands back a link
What was hard
Three categories, each worth its own post:
-
SSE + Spring Security + async-dispatch. Three separate traps that made SSE die with AccessDenied at finalization.
@PreAuthorizebroke async dispatch,SecurityContextdidn't propagate to the worker thread,SecurityFilterChainhitDispatcherType.ASYNC. Breakdown — in the next post. -
Prompt engineering as an engineering discipline. The model didn't call SQL for aggregates, made up nonexistent cities and IPs, leaked internal db ids. The fix wasn't a model upgrade — it was tool design and the system prompt as code. Detailed breakdown in the prompt-as-code post.
-
Browser prefetch ate file attachments. A classic trap: the browser fired HEAD/GET on the link to preview, and the one-shot token burned before the user clicked. Fixed by
Cache-Control: no-store+ aSec-Purpose: prefetchfilter on the server. Worth its own short note.
What I learned
- In-process MCP > HTTP MCP when the LLM loop and the server live in the same process. Serialization for serialization's sake is dead code.
- Tool design > model size. Answer accuracy went up more from adding the
distinct_valuestool than from any attempts to pick a beefier model. - Prompt is code. Every "weird" behavior is a commit to the system prompt, not "AI is dumb."
- A local LLM is a policy question, not a capability question. gpt-oss:20b with tool calling closes 90% of an embedded-assistant's tasks without leaving the server.
An AI assistant inside a product isn't the model. It's a hundred lines of tool loop, a set of narrow MCP tools on top of existing services, and three pages of system prompt. The model you pick is the least interesting part of the stack.
Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.