Fail2ban archaeology: five traps that quietly piled up dead bans for years
Went into Grafana to see who's hitting the site — found 1,300 self-hits per day. Pulled the thread: a broken action, a regex banning the host header, 1,339 "banned" IPs with no iptables rule. Trust but verify.
I run a small home-lab on hs1: portfolio site, mail server, GitLab, Grafana. Defense by the book — fail2ban with 11 jails. fail2ban-client status showed cheerful numbers: thousands of Total banned, dozens Currently banned. I felt safe.
Until I went to Grafana to look at fresh analytics.
The hook: 1,300 self-hits a day
I'm staring at the gmasich-seo-traffic dashboard — the pageview chart is suspiciously flat. I open Top User-Agents. Top row: curl/8.7.1 with empty referer, hitting from 192.168.10.1. That's my MikroTik gateway IP, i.e. hs1 itself.
Turns out it's cross-monitor.sh — my own script that curls https://gmasich.ru, mail.gmasich.ru, wtf.gmasich.ru every 3 minutes and pings me on Telegram if anything responds non-2xx/3xx. Useful thing. But shows up as 480 hits/day in pageview metrics.
I'd already added remote_addr != "192.168.10.1" to the dashboard filter. The catch: cross-monitor doesn't live on a single host. I dig into infrastructure/monitoring.md:
Cross-server Monitoring is installed on: hs1, rg1, hs2
Three hosts. And they don't curl through WireGuard — they resolve the DNS name gmasich.ru. So a request from rg1 goes through the public internet → MikroTik → hs1, and the public IP of the originating host lands in access.log, not 192.168.10.1. Sure enough: 203.0.113.3 (rg1) and 203.0.113.2 (hs2) are both there.
OK, I extend the filter regex to cover the entire fleet (plus the WG mesh 10.99.0.0/24 for peers that may go through the tunnel):
| remote_addr !~ "^(10[.]99[.]0[.][0-9]+|192[.]168[.]10[.]1|192[.]168[.]10[.]20|203[.]0[.]113[.]1|203[.]0[.]113[.]2|192[.]168[.]10[.]33|203[.]0[.]113[.]3|203[.]0[.]113[.]10|203[.]0[.]113[.]20)$"
I escape dots as [.] instead of \\. — shorter, and avoids quadruple escaping when sending JSON via the MCP update_dashboard.
Second move: convert all my curl-based scripts to a single User-Agent — homelab-monitor/1.0. Now self-traffic stands out in logs, and this UA dodges the fail2ban filter (you'll see why).
Trap 1: a jail that bans without blocking
Since I'm digging into logs anyway, I add a Top User-Agents panel to the dashboard. Who's hitting right now?
| UA | Hits |
|---|---|
Mozilla/5.0 (exact) | 17 |
ALittle Client | 14 |
Mozilla/4.0(compatible;MSIE8.0;...) (no spaces) | 9 |
All three are obvious scanners. A real Mozilla/5.0 has a tail like (Windows NT 10.0...) Chrome/120.0. A real IE writes Mozilla/4.0 (compatible;... — with a space after the ;.
Strange — I have a nginx-user-agent jail that should catch this. Check status:
$ sudo fail2ban-client status nginx-user-agent
Currently banned: 0
Total banned: 0
Zero. Ever. Despite a filter that catches a whole pile of UA signatures (curl, python, libwww, zgrab, Scrapy, ...). Open the jail config:
[nginx-user-agent]
enabled = true
filter = nginx-ua
maxretry = 1
bantime = 86400
action = ipset[name=blacklist, protocol=all]
ipset[name=blacklist]. Open the action:
# /etc/fail2ban/action.d/ipset.conf
actionstart = ipset create blacklist hash:ip
actionban = ipset add blacklist <ip>
actionunban = ipset del blacklist <ip>
A homegrown action. Creates the ipset, dumps IPs into it. That's it. No iptables -m set --match-set blacklist src -j REJECT. So 1,256 "banned" IPs sit in an ipset that nothing references — References: 0 in ipset list.
I wrote this myself once, clearly thinking "I'll add the iptables rule later." Never did. A couple of years passed. fail2ban dutifully filled the list while I assumed it was blocking.
The standard action that works: iptables-ipset-proto6:
actionstart = ipset --create f2b-<name> hash:ip ...
iptables -I INPUT ... -m set --match-set f2b-<name> src -j REJECT
Both creates the ipset and installs the iptables rule. Two lines, the whole thing works. Replace the action:
action = iptables-ipset-proto6[name=nginx-ua, port="http,https", protocol=tcp, blocktype=REJECT]
port="http,https" + blocktype=REJECT — ban only web ports, not generic. If the filter accidentally catches my mail-proxy rg1 (which hits via the same curl), at least SMTP/IMAP/SSH stay alive.
fail2ban-client reload nginx-user-agent. Wait for the first ban.
Trap 2: actionstart_on_demand
An hour later: Currently banned: 0. No ipset f2b-nginx-ua. No iptables rule. But the filter is firing in logs: [nginx-user-agent] Found 192.0.2.55.
Below in /var/log/fail2ban.log:
2026-05-07 10:02:36 fail2ban.actions ERROR Failed to execute ban jail 'nginx-user-agent'
action 'iptables-ipset-proto4-nginx-ua' info '...': Error starting action: 'Script error'
The action itself is crashing. The exact command, lower in the log:
exec: ipset --create f2b-nginx-ua maxelem 65536 iphash
{ iptables -w -C INPUT -p $proto --dport http,https -m set ... }
-p $proto — the shell variable $proto in the action template isn't being expanded. And --dport http,https without -m multiport is invalid for multiple ports. This is iptables-ipset-proto4 — the old template for ipset v4, doesn't fit modern Debian.
Switch to iptables-ipset-proto6 (type = multiport, before = iptables-ipset.conf). Reload — still no ipset.
198.51.100.99 is TEST-NET-2 RFC, not a real IP. Safe for tests. Trigger — ipset gets created, iptables INPUT gets the rule. Unban the test, action stays in place.
Trap 3: the regex bans the host header instead of the client
Another hour. Currently banned: 0. ipset still empty. But access.log has plenty of lines that should match:
gmasich.ru 192.0.2.78 - [...] "GET /wp-login.php" 444 0 "-" "Mozilla/5.0"
203.0.113.1 192.0.2.55 - [...] "GET /" 444 0 "-" "Mozilla/5.0"
The first field is the host header. On gmasich.ru it's gmasich.ru. When a scanner hits the IP directly, the host equals the IP, and nginx logs the first field as 203.0.113.1 — my own hs1 public IP.
The filter:
^<HOST> .*"(GET|POST|HEAD).* HTTP.*" .* "-" "(curl|python|...)..."
<HOST> is fail2ban's IP placeholder. It matches an IP or a DNS name (regex [\w\-.^_]*\w). And it matches the first field, which in nginx main log format is $host, not $remote_addr.
So for the lines above, fail2ban extracts gmasich.ru (DNS lookup attempt) or 203.0.113.1 (my own IP — saved by ignoreip from self-banning). Real scanners — 192.0.2.78, 192.0.2.55 — never got banned.
Fix:
^[^ ]+ <HOST> .*"(GET|POST|HEAD).* HTTP.*" .* "-" "(curl|python|...)..."
[^ ]+ skips the first field (vhost), and <HOST> now matches the second ($remote_addr). After reload — 10 minutes later, the first organic ban: Ban 192.0.2.55. ipset f2b-nginx-ua shows 1 entry. iptables rejects on 80/443.
It works.
Trap 4: five other jails in the same shape
I extend the filter with four more signatures (-, Mozilla/5.0, ALittle Client, Mozilla/4.0(compatible; without space, BackupLand). I test via fail2ban-regex against the raw log — Googlebot, YandexBot, normal Chrome don't match; trash UAs do. Validation passes.
Then it occurs to me: I have five other jails using the same broken action.
$ grep -l "ipset\[name=blacklist" /etc/fail2ban/jail.d/*.local
sshd.local
nginx-bad-requests.local
nginx-hiddenfiles.local
nginx-limit-req-2.local
wp-scan3.local
sshd, too. Which means SSH bans configured for 90 days never actually blocked anyone. Whether the count was 19 or 1,900 — none of them blocked.
Convert all five to iptables-ipset-proto6, each with a unique name=:
[sshd]
action = iptables-ipset-proto6[name=sshd, port="1221", protocol=tcp, blocktype=REJECT]
[nginx-bad-requests]
action = iptables-ipset-proto6[name=bad-requests, port="http,https", protocol=tcp, blocktype=REJECT]
# and so on for hiddenfiles, limit-req-2, wp-scan
Each jail gets its own f2b-<name> ipset and INPUT rule. The old shared blacklist ipset goes under the knife.
systemctl restart fail2ban (important: not reload — the homegrown action stays zombie on simple reload, only a full restart releases it). On startup fail2ban reads /var/lib/fail2ban/fail2ban.sqlite3, table bips, and restores bans through the new action.
End result: 1,339 IPs migrated from the dead blacklist into seven live ipsets:
| Jail | Banned | ipset |
|---|---|---|
| sshd | 19 | f2b-sshd → 1221 |
| nginx-bad-requests | 0 | f2b-bad-requests → 80,443 |
| nginx-hiddenfiles | 13 | f2b-hiddenfiles → 80,443 |
| nginx-limit-req-2 | 10 | f2b-limit-req-2 → 80,443 |
| nginx-user-agent | 1 | f2b-nginx-ua → 80,443 |
| wp-scan3 | 1291 | f2b-wp-scan → 80,443 |
| recidive (new) | 5 | f2b-recidive → 1221,80,443 |
I added a recidive jail too — standard pattern, catches an IP that f2b banned maxretry=3 times in 7 days, and blocks it for 30 days.
netfilter-persistent save — iptables rules now survive reboot.
Trap 5: bantime = -1 and what to do with it
wp-scan3 — 1,291 banned. With bantime = -1. Permanent.
A year ago I figured "WordPress scanners — no mercy." Reasonable. But now I have 1,291 IPs in a perma-list, accumulated since May 2025. AWS, GCP, DO recycle IPs constantly. A chunk of that list belongs to other owners by now.
I change bantime to 90 days (7,776,000 seconds). Reload jail.
Discovery: reload doesn't recompute existing bans. fail2ban stores endOfBan = startOfBan + bantime at ticket creation. New bans go to 90 days. Old ones stay forever. You can't change policy retroactively without explicit unban.
OK, manual cleanup. Schema of fail2ban.sqlite3:
CREATE TABLE bips(
ip TEXT NOT NULL,
jail TEXT NOT NULL,
timeofban INTEGER NOT NULL,
bantime INTEGER NOT NULL,
...
);
Query: "all IPs banned more than 90 days ago":
SELECT ip FROM bips
WHERE jail='wp-scan3' AND timeofban < strftime('%s','now') - 7776000;
Distribution by age:
| Ban age | Count |
|---|---|
| < 90 days | 530 |
| 90-180 days | 402 |
| 180-365 days | 359 |
To clean up — 761 IPs. Script:
ssh hs1 '
cutoff=$(date +%s -d "90 days ago")
sudo sqlite3 /var/lib/fail2ban/fail2ban.sqlite3 \
"SELECT ip FROM bips WHERE jail=\"wp-scan3\" AND timeofban < $cutoff" |
while read ip; do
sudo fail2ban-client set wp-scan3 unbanip "$ip" >/dev/null
done
sudo netfilter-persistent save
'
3 minutes, 761 unbanned, 0 errors. 530 fresh ones remain. ipset and iptables synced.
Bonus: why the server configs lived outside git
While digging through this, I noticed an uncomfortable asymmetry: my application code is in git, my infrastructure docs are in git, but the service config files on the servers are only on the servers. Each edit of /etc/fail2ban/jail.d/sshd.local existed in exactly one copy. No history, no diffs, no rollback.
I made a mirror in infrastructure/server-configs/:
server-configs/
├── README.md, deploy.sh, .gitignore
├── hs1/
│ ├── etc/fail2ban/{jail.d/*.local, filter.d/nginx-ua.conf}
│ ├── usr/local/bin/check-hysteria.sh
│ └── root/notify_ssh_login.sh
├── hs2/
│ └── usr/local/bin/fail2ban-telegram.sh
└── shared/
└── opt/cross-monitor/cross-monitor.sh # deployed to hs1+rg1+hs2
The structure mirrors absolute paths on the server. deploy.sh hs1 does a dry-run diff against prod, APPLY=1 deploy.sh hs1 copies via scp + sudo cp and backs up existing files as .bak-YYYYMMDD-HHMMSS. Workflow:
$EDITOR server-configs/hs1/etc/fail2ban/jail.d/sshd.local
./deploy.sh hs1 # diff
APPLY=1 ./deploy.sh hs1 # apply
ssh hs1 'sudo fail2ban-client reload'
git commit -am "fail2ban: tighten sshd"
History exists now. And if I do something weird next time and break prod — git revert.
Lessons
-
"Currently banned: 19" is not the source of truth. The truth is
ipset list f2b-<name>(Number of entries) andiptables -S INPUT | grep f2b-(does a REJECT rule exist). Check both. -
<HOST>in fail2ban regex is the first field that matches an IP-like pattern. For nginxmainlog format you need^[^ ]+ <HOST>to skip the vhost. Otherwise it's either a DNS lookup ongmasich.ru, or you self-ban your server's public IP. -
iptables-ipset-proto4is broken on modern distros (kernel 3.0+, ipset v6+). Useiptables-ipset-proto6. blocktype=REJECT withport=constrained — so an accidental ban doesn't take the same IP off your whole network. -
actionstart_on_demand=true— actionstart runs on the first ban, not on reload. To test:fail2ban-client set <jail> banip 198.51.100.99(TEST-NET-2 RFC), then checkipset list f2b-<name>. -
ignoreipat jail level overrides, not extends the global DEFAULT. When overriding locally, copy all the original networks (CF, LAN, etc.) and add your own. -
reloaddoesn't recompute existing bans whenbantimechanges. To clean up old ones you need explicitunbanip(via sqlite + loop). -
Self-traffic in analytics isn't a single cron, it's your whole fleet. Any host that curls your DNS name shows up in logs with its public IP. Filter by network, not by individual machines.
-
One UA for all your own scripts (mine —
homelab-monitor/1.0) — simplifies filtering and makes self-traffic visually distinct in Grafana. Plus dodges your own fail2ban filter without special whitelist exceptions. -
Server configs belong in git. Through
server-configs/+deploy.sh, not "edit in place." History + diff + rollback. -
Trust, but verify. Especially your own code from a year ago.
After all this, I came away with a clear sense — the bulk of security work wasn't setting up something new. It was making sure that what was set up long ago actually does what I thought it did. That, perhaps, is the main takeaway.
Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.