Local LLM in network monitoring: Ollama + MCP + Spring Boot in three days

How I plugged a local LLM into an OLT monitoring system through MCP. Architecture, an in-process dispatcher instead of HTTP, 16 tools, SSE to the browser.

4 May 2026#llm#mcp#spring-boot#ollama#case

GetOLT, the PON-network monitoring service I'm developing (origin story), tracks 350+ OLTs and 100K+ ONUs. When the tech team needs to answer "how many ONUs in Almaty have a bad signal on CDATA?", nobody writes SQL. Until this week — 5 clicks through filters; after — one question in a chat.

The bot answers in natural language and hits the real database, not a cache. From first commit to a run on three live OLTs — three days.

Why a local LLM

First question — why not GPT/Claude through an API. The answer is product-driven, not technical:

Customer data — billing IDs, equipment IPs, contract logins — must not leave for third-party clouds
Russian 152-FZ data law: user consent for processing personal data is in place; cross-border transfer consent isn't, and getting it for the convenience of a bot is pointless
gpt-oss:20b on Apple Silicon does 30-40 tokens/sec — acceptable for an interactive chat
Zero rubles per request, the limit is hardware muscle, not budget

If I were building an assistant for an open-source tool, the choice would be different. For a commercial product in B2B telecom, a local LLM is the only honest path.

Architecture: three boxes

[Browser]  ←SSE—  [LlmChatController]  →  [LlmChatService]
                                              │ tool-loop
                                              ▼
                              ┌──────────────┬──────────────────┐
                          [Ollama]     [InProcessMcpDispatcher]
                       gpt-oss:20b              │
                                                ▼
                                          [16 MCP tools]
                                                │
                                                ▼
                                  [OltService, OnuService, ...]
                                                │
                                                ▼
                                            [MySQL]

The key decision — in-process MCP dispatcher. Standard MCP assumes a separate process with HTTP or stdio transport. In my setup the LLM loop and the MCP server live inside the same Spring Boot, so I threw out serialization. A tool call is just a Java method by name with JSON arguments, no HTTP bridge.

Downside: you can't reuse the MCP server from other clients (Claude Desktop, for example). Upside: less code, less latency, single Security context and single audit trail. For an embedded assistant inside a product, that's the right tradeoff.

LlmChatService.run is a plain tool loop:

while (turn < MAX_TURNS) {
    var response = ollama.chat(messages, toolSchemas);
    if (response.toolCalls().isEmpty()) {
        emit(response.content());
        break;
    }
    for (var call : response.toolCalls()) {
        var result = mcpDispatcher.invoke(call.name(), call.args());
        messages.add(toolResultMessage(call.id(), result));
        emit(new ToolEvent(call, result));
    }
    turn++;
}

SSE streaming sends the user not only the final text but every tool call with arguments and result. You can see how the bot builds its answer — which filter it applied, what the DB returned. That's both a UX feature and a debugging tool.

What I got end-to-end

16 MCP tools on top of existing services: olt.list, onu.stats_by_olt, olt.distinct_values, data.export, feedback.send, and others. Each one narrow, with an explicit JSON schema.
Full append-only audit: chat_session + chat_message + mcp_chat_log. Any conversation can be reconstructed byte-for-byte.
Ephemeral file attachments: the model can dump an aggregate to CSV/JSON, the file lives 5 minutes, the link is one-shot.
Read-only SQL escape hatch: if a task isn't covered by tools, the model can ask for SQL — but through a separate tool with SELECT-only validation. This cuts hallucinations and gives a zone of gradual coverage.

A run in production, three live OLTs, real data:

"show ONUs with signal worse than -27 dBm on CDATA in Almaty" — answers in 4-6 seconds
"how many ports are over 75% full per operator?" — computes the aggregate, draws a table
"export the list of problem ONUs for the week" — generates CSV, hands back a link

What was hard

Three categories, each worth its own post:

SSE + Spring Security + async-dispatch. Three separate traps that made SSE die with AccessDenied at finalization. @PreAuthorize broke async dispatch, SecurityContext didn't propagate to the worker thread, SecurityFilterChain hit DispatcherType.ASYNC. Breakdown — in the next post.
Prompt engineering as an engineering discipline. The model didn't call SQL for aggregates, made up nonexistent cities and IPs, leaked internal db ids. The fix wasn't a model upgrade — it was tool design and the system prompt as code. Detailed breakdown in the prompt-as-code post.
Browser prefetch ate file attachments. A classic trap: the browser fired HEAD/GET on the link to preview, and the one-shot token burned before the user clicked. Fixed by Cache-Control: no-store + a Sec-Purpose: prefetch filter on the server. Worth its own short note.

What I learned

In-process MCP > HTTP MCP when the LLM loop and the server live in the same process. Serialization for serialization's sake is dead code.
Tool design > model size. Answer accuracy went up more from adding the distinct_values tool than from any attempts to pick a beefier model.
Prompt is code. Every "weird" behavior is a commit to the system prompt, not "AI is dumb."
A local LLM is a policy question, not a capability question. gpt-oss:20b with tool calling closes 90% of an embedded-assistant's tasks without leaving the server.

An AI assistant inside a product isn't the model. It's a hundred lines of tool loop, a set of narrow MCP tools on top of existing services, and three pages of system prompt. The model you pick is the least interesting part of the stack.

Telegram X (Twitter)

Discussion

Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.

Open discussion on GitHub ↗