GetOLT: how a pile of telnet scripts grew into a PON-network monitoring service
Origin story of an OLT/ONU monitoring service in a telecom. From scattered scripts to a production tool that has been catching signal degradation and closing the 'subscriber → OLT/port/ONU' link for over a year.
The telecom company where I worked ran about a dozen disconnected systems — ERP, five billings, BI with reports, a ticket tracker, a couple more — and not one of them answered a simple question: "on which OLT, which PON port, and through which ONU is subscriber Petrov connected?"
Sounds like a small thing. But that small thing cost the company hundreds of support hours a month.
It now lives as a product — getolt.online.
The blind spot between ERP and the hardware
When a subscriber calls support saying "internet is down," the first-line operator has one minute to figure out: is this a personal problem (bad signal, bad cable, unpaid bill) or a network-wide outage taking the whole apartment block with it?
In our infrastructure each system knew its own slice:
- ERP — contract, name, plan, address
- Five billings (historically different regions, different contractors, migrations "for later") — balances, charges, payment history
- BI — management revenue and churn
- Hardware (OLT) — telnet/SSH, vendor-specific CLI, no integration outward
The link "contract # → OLT, PON port, subscriber's ONU" was nowhere stored. The operator opened ERP, searched by name, looked up the address, called the field crew "go drive over and take a look." Every second visit turned out empty — the problem wasn't on the subscriber's end.
First attempt: telnet scripts
The logical place to start was scripts. Telnet session to OLT, parse the show onu output, map ONU serial → login → contract. First bash, then Python, on cron. The output went to a text file, then to a shared spreadsheet.
It worked. For a few weeks.
A month in we added Rx/Tx signal-level reads per ONU — turned out critical for troubleshooting. Then PON port utilization for capacity planning. Then firmware versions for the security audit. Then logging of the telnet sessions themselves so we could debug scripts that broke every time the vendor pushed a firmware update and changed the output format.
By the end of the third month:
- Six scripts
- Every new field meant edits across all six
- Cron schedules started conflicting with themselves
- The text "schema" didn't scale — some fields became nullable, somebody added an extra separator
- Logs went to
/tmp/, no rotation, the disk filled up sometimes
At some point it became obvious: this isn't a handy script. It's a half-broken service with no UI, no retries, no monitoring of itself.
Step two: spin it out into a service
The decision was straightforward — separate it into its own product:
- DB (MySQL) with a clear schema:
olt,pon_port,onu,signal_log,telnet_log,subscriber_link - Scheduler — polls every OLT every 30 minutes, 100-session parallelism cap, 60-second timeout per session
- Web UI — search by contract / subscriber / ONU serial / port, signal aggregates
- API — so the dispatcher software and BI hit one source instead of six different scripts
The telnet logic stayed the same — same commands, same parsers. It just moved from bash scripts into a Spring Boot service with a normal lifecycle.
What it became after a year+ in prod
The service has long since crossed from "team's handy tool" to infrastructure component. Today it:
- Polls 350+ OLTs from different vendors (CDATA, BDCOM, GateRay), 24/7, on schedule
- Stores historical signal levels — ONU degradation from -22 to -28 dBm over two weeks shows up immediately, before the subscriber calls support
- Closes the "subscriber → OLT/port/ONU" link — first-line support sees in 2 seconds which OLT and PON port the caller's ONU is on, and the current signal
- Counts port utilization — the network team sees where capacity hits 80% and it's time to order splitters
- Firmware versions — the security team sees which OLTs sit in CVE range and need an upgrade
The easiest effect to measure: the share of "empty" field crew visits (showed up, subscriber's fine) — dropped roughly threefold. Sounds modest, but over a year that's hundreds of engineering hours and tens of thousands in payroll savings.
What's next
The service is gradually morphing from a tool "for ourselves" into a commercial MVP — CIS operators have the same pains and have been asking. In parallel another layer is being added: an AI assistant. The bot hits the same database via MCP tools and answers in natural language — a support operator doesn't need to know SQL or table schemas to understand "this subscriber's signal has been degrading the last two weeks."
About the assistant — in the next posts of the series:
- Architecture: Ollama + MCP + Spring Boot — how it's wired end-to-end
- SSE + Spring Security — three async-dispatch traps that break streaming
- Prompt is code — how to cure filter hallucinations through tool design
What I learned along the way
- A script is an early prototype of a service. If it's growing and getting hard to maintain, that's not "we need to refactor the script." That's "time to spin it out."
- Links between systems are more valuable than the systems themselves. ERP, five billings, and the hardware separately are industry standard. One JOIN between them is competitive advantage that saves the company more than the JOIN costs.
- Production data gives focus. A year of running on a real network showed what's valuable (signals, capacity, links to subscribers) and what's decorative (detailed telnet logs nobody reads).
- Preventive analytics > reactive. Catching a degrading signal two days before the subscriber calls is worth more than fixing the outage ten minutes after.
Good services rarely come from nothing. More often they grow out of a pile of scripts that became uncomfortable to live in.
Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.