GetOLT: how a pile of telnet scripts grew into a PON-network monitoring service

Origin story of an OLT/ONU monitoring service in a telecom. From scattered scripts to a production tool that has been catching signal degradation and closing the 'subscriber → OLT/port/ONU' link for over a year.

2 May 2026#telecom#monitoring#pon#case#evolution

The telecom company where I worked ran about a dozen disconnected systems — ERP, five billings, BI with reports, a ticket tracker, a couple more — and not one of them answered a simple question: "on which OLT, which PON port, and through which ONU is subscriber Petrov connected?"

Sounds like a small thing. But that small thing cost the company hundreds of support hours a month.

It now lives as a product — getolt.online.

The blind spot between ERP and the hardware

When a subscriber calls support saying "internet is down," the first-line operator has one minute to figure out: is this a personal problem (bad signal, bad cable, unpaid bill) or a network-wide outage taking the whole apartment block with it?

In our infrastructure each system knew its own slice:

ERP — contract, name, plan, address
Five billings (historically different regions, different contractors, migrations "for later") — balances, charges, payment history
BI — management revenue and churn
Hardware (OLT) — telnet/SSH, vendor-specific CLI, no integration outward

The link "contract # → OLT, PON port, subscriber's ONU" was nowhere stored. The operator opened ERP, searched by name, looked up the address, called the field crew "go drive over and take a look." Every second visit turned out empty — the problem wasn't on the subscriber's end.

First attempt: telnet scripts

The logical place to start was scripts. Telnet session to OLT, parse the show onu output, map ONU serial → login → contract. First bash, then Python, on cron. The output went to a text file, then to a shared spreadsheet.

It worked. For a few weeks.

A month in we added Rx/Tx signal-level reads per ONU — turned out critical for troubleshooting. Then PON port utilization for capacity planning. Then firmware versions for the security audit. Then logging of the telnet sessions themselves so we could debug scripts that broke every time the vendor pushed a firmware update and changed the output format.

By the end of the third month:

Six scripts
Every new field meant edits across all six
Cron schedules started conflicting with themselves
The text "schema" didn't scale — some fields became nullable, somebody added an extra separator
Logs went to /tmp/, no rotation, the disk filled up sometimes

At some point it became obvious: this isn't a handy script. It's a half-broken service with no UI, no retries, no monitoring of itself.

Step two: spin it out into a service

The decision was straightforward — separate it into its own product:

DB (MySQL) with a clear schema: olt, pon_port, onu, signal_log, telnet_log, subscriber_link
Scheduler — polls every OLT every 30 minutes, 100-session parallelism cap, 60-second timeout per session
Web UI — search by contract / subscriber / ONU serial / port, signal aggregates
API — so the dispatcher software and BI hit one source instead of six different scripts

The telnet logic stayed the same — same commands, same parsers. It just moved from bash scripts into a Spring Boot service with a normal lifecycle.

What it became after a year+ in prod

The service has long since crossed from "team's handy tool" to infrastructure component. Today it:

Polls 350+ OLTs from different vendors (CDATA, BDCOM, GateRay), 24/7, on schedule
Stores historical signal levels — ONU degradation from -22 to -28 dBm over two weeks shows up immediately, before the subscriber calls support
Closes the "subscriber → OLT/port/ONU" link — first-line support sees in 2 seconds which OLT and PON port the caller's ONU is on, and the current signal
Counts port utilization — the network team sees where capacity hits 80% and it's time to order splitters
Firmware versions — the security team sees which OLTs sit in CVE range and need an upgrade

The easiest effect to measure: the share of "empty" field crew visits (showed up, subscriber's fine) — dropped roughly threefold. Sounds modest, but over a year that's hundreds of engineering hours and tens of thousands in payroll savings.

What's next

The service is gradually morphing from a tool "for ourselves" into a commercial MVP — CIS operators have the same pains and have been asking. In parallel another layer is being added: an AI assistant. The bot hits the same database via MCP tools and answers in natural language — a support operator doesn't need to know SQL or table schemas to understand "this subscriber's signal has been degrading the last two weeks."

About the assistant — in the next posts of the series:

Architecture: Ollama + MCP + Spring Boot — how it's wired end-to-end
SSE + Spring Security — three async-dispatch traps that break streaming
Prompt is code — how to cure filter hallucinations through tool design

What I learned along the way

A script is an early prototype of a service. If it's growing and getting hard to maintain, that's not "we need to refactor the script." That's "time to spin it out."
Links between systems are more valuable than the systems themselves. ERP, five billings, and the hardware separately are industry standard. One JOIN between them is competitive advantage that saves the company more than the JOIN costs.
Production data gives focus. A year of running on a real network showed what's valuable (signals, capacity, links to subscribers) and what's decorative (detailed telnet logs nobody reads).
Preventive analytics > reactive. Catching a degrading signal two days before the subscriber calls is worth more than fixing the outage ten minutes after.

Good services rarely come from nothing. More often they grow out of a pile of scripts that became uncomfortable to live in.

Telegram X (Twitter)

Discussion

Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.

Open discussion on GitHub ↗