← § BLOG

Fail2ban archaeology: five traps that quietly piled up dead bans for years

Went into Grafana to see who's hitting the site — found 1,300 self-hits per day. Pulled the thread: a broken action, a regex banning the host header, 1,339 "banned" IPs with no iptables rule. Trust but verify.

#fail2ban#security#nginx#monitoring#devops

I run a small home-lab on hs1: portfolio site, mail server, GitLab, Grafana. Defense by the book — fail2ban with 11 jails. fail2ban-client status showed cheerful numbers: thousands of Total banned, dozens Currently banned. I felt safe.

Until I went to Grafana to look at fresh analytics.

The hook: 1,300 self-hits a day

I'm staring at the gmasich-seo-traffic dashboard — the pageview chart is suspiciously flat. I open Top User-Agents. Top row: curl/8.7.1 with empty referer, hitting from 192.168.10.1. That's my MikroTik gateway IP, i.e. hs1 itself.

Turns out it's cross-monitor.sh — my own script that curls https://gmasich.ru, mail.gmasich.ru, wtf.gmasich.ru every 3 minutes and pings me on Telegram if anything responds non-2xx/3xx. Useful thing. But shows up as 480 hits/day in pageview metrics.

I'd already added remote_addr != "192.168.10.1" to the dashboard filter. The catch: cross-monitor doesn't live on a single host. I dig into infrastructure/monitoring.md:

Cross-server Monitoring is installed on: hs1, rg1, hs2

Three hosts. And they don't curl through WireGuard — they resolve the DNS name gmasich.ru. So a request from rg1 goes through the public internet → MikroTik → hs1, and the public IP of the originating host lands in access.log, not 192.168.10.1. Sure enough: 203.0.113.3 (rg1) and 203.0.113.2 (hs2) are both there.

OK, I extend the filter regex to cover the entire fleet (plus the WG mesh 10.99.0.0/24 for peers that may go through the tunnel):

| remote_addr !~ "^(10[.]99[.]0[.][0-9]+|192[.]168[.]10[.]1|192[.]168[.]10[.]20|203[.]0[.]113[.]1|203[.]0[.]113[.]2|192[.]168[.]10[.]33|203[.]0[.]113[.]3|203[.]0[.]113[.]10|203[.]0[.]113[.]20)$"

I escape dots as [.] instead of \\. — shorter, and avoids quadruple escaping when sending JSON via the MCP update_dashboard.

Second move: convert all my curl-based scripts to a single User-Agent — homelab-monitor/1.0. Now self-traffic stands out in logs, and this UA dodges the fail2ban filter (you'll see why).

Trap 1: a jail that bans without blocking

Since I'm digging into logs anyway, I add a Top User-Agents panel to the dashboard. Who's hitting right now?

UAHits
Mozilla/5.0 (exact)17
ALittle Client14
Mozilla/4.0(compatible;MSIE8.0;...) (no spaces)9

All three are obvious scanners. A real Mozilla/5.0 has a tail like (Windows NT 10.0...) Chrome/120.0. A real IE writes Mozilla/4.0 (compatible;...with a space after the ;.

Strange — I have a nginx-user-agent jail that should catch this. Check status:

$ sudo fail2ban-client status nginx-user-agent
Currently banned: 0
Total banned: 0

Zero. Ever. Despite a filter that catches a whole pile of UA signatures (curl, python, libwww, zgrab, Scrapy, ...). Open the jail config:

[nginx-user-agent]
enabled = true
filter = nginx-ua
maxretry = 1
bantime = 86400
action = ipset[name=blacklist, protocol=all]

ipset[name=blacklist]. Open the action:

# /etc/fail2ban/action.d/ipset.conf
actionstart = ipset create blacklist hash:ip
actionban = ipset add blacklist <ip>
actionunban = ipset del blacklist <ip>

A homegrown action. Creates the ipset, dumps IPs into it. That's it. No iptables -m set --match-set blacklist src -j REJECT. So 1,256 "banned" IPs sit in an ipset that nothing references — References: 0 in ipset list.

I wrote this myself once, clearly thinking "I'll add the iptables rule later." Never did. A couple of years passed. fail2ban dutifully filled the list while I assumed it was blocking.

The standard action that works: iptables-ipset-proto6:

actionstart = ipset --create f2b-<name> hash:ip ...
              iptables -I INPUT ... -m set --match-set f2b-<name> src -j REJECT

Both creates the ipset and installs the iptables rule. Two lines, the whole thing works. Replace the action:

action = iptables-ipset-proto6[name=nginx-ua, port="http,https", protocol=tcp, blocktype=REJECT]

port="http,https" + blocktype=REJECT — ban only web ports, not generic. If the filter accidentally catches my mail-proxy rg1 (which hits via the same curl), at least SMTP/IMAP/SSH stay alive.

fail2ban-client reload nginx-user-agent. Wait for the first ban.

Trap 2: actionstart_on_demand

An hour later: Currently banned: 0. No ipset f2b-nginx-ua. No iptables rule. But the filter is firing in logs: [nginx-user-agent] Found 192.0.2.55.

Below in /var/log/fail2ban.log:

2026-05-07 10:02:36 fail2ban.actions ERROR Failed to execute ban jail 'nginx-user-agent'
  action 'iptables-ipset-proto4-nginx-ua' info '...': Error starting action: 'Script error'

The action itself is crashing. The exact command, lower in the log:

exec: ipset --create f2b-nginx-ua maxelem 65536 iphash
{ iptables -w -C INPUT -p $proto --dport http,https -m set ... }

-p $proto — the shell variable $proto in the action template isn't being expanded. And --dport http,https without -m multiport is invalid for multiple ports. This is iptables-ipset-proto4 — the old template for ipset v4, doesn't fit modern Debian.

Switch to iptables-ipset-proto6 (type = multiport, before = iptables-ipset.conf). Reload — still no ipset.

198.51.100.99 is TEST-NET-2 RFC, not a real IP. Safe for tests. Trigger — ipset gets created, iptables INPUT gets the rule. Unban the test, action stays in place.

Trap 3: the regex bans the host header instead of the client

Another hour. Currently banned: 0. ipset still empty. But access.log has plenty of lines that should match:

gmasich.ru 192.0.2.78 - [...] "GET /wp-login.php" 444 0 "-" "Mozilla/5.0"
203.0.113.1 192.0.2.55 - [...] "GET /" 444 0 "-" "Mozilla/5.0"

The first field is the host header. On gmasich.ru it's gmasich.ru. When a scanner hits the IP directly, the host equals the IP, and nginx logs the first field as 203.0.113.1 — my own hs1 public IP.

The filter:

^<HOST> .*"(GET|POST|HEAD).* HTTP.*" .* "-" "(curl|python|...)..."

<HOST> is fail2ban's IP placeholder. It matches an IP or a DNS name (regex [\w\-.^_]*\w). And it matches the first field, which in nginx main log format is $host, not $remote_addr.

So for the lines above, fail2ban extracts gmasich.ru (DNS lookup attempt) or 203.0.113.1 (my own IP — saved by ignoreip from self-banning). Real scanners — 192.0.2.78, 192.0.2.55never got banned.

Fix:

^[^ ]+ <HOST> .*"(GET|POST|HEAD).* HTTP.*" .* "-" "(curl|python|...)..."

[^ ]+ skips the first field (vhost), and <HOST> now matches the second ($remote_addr). After reload — 10 minutes later, the first organic ban: Ban 192.0.2.55. ipset f2b-nginx-ua shows 1 entry. iptables rejects on 80/443.

It works.

Trap 4: five other jails in the same shape

I extend the filter with four more signatures (-, Mozilla/5.0, ALittle Client, Mozilla/4.0(compatible; without space, BackupLand). I test via fail2ban-regex against the raw log — Googlebot, YandexBot, normal Chrome don't match; trash UAs do. Validation passes.

Then it occurs to me: I have five other jails using the same broken action.

$ grep -l "ipset\[name=blacklist" /etc/fail2ban/jail.d/*.local
sshd.local
nginx-bad-requests.local
nginx-hiddenfiles.local
nginx-limit-req-2.local
wp-scan3.local

sshd, too. Which means SSH bans configured for 90 days never actually blocked anyone. Whether the count was 19 or 1,900 — none of them blocked.

Convert all five to iptables-ipset-proto6, each with a unique name=:

[sshd]
action = iptables-ipset-proto6[name=sshd, port="1221", protocol=tcp, blocktype=REJECT]

[nginx-bad-requests]
action = iptables-ipset-proto6[name=bad-requests, port="http,https", protocol=tcp, blocktype=REJECT]

# and so on for hiddenfiles, limit-req-2, wp-scan

Each jail gets its own f2b-<name> ipset and INPUT rule. The old shared blacklist ipset goes under the knife.

systemctl restart fail2ban (important: not reload — the homegrown action stays zombie on simple reload, only a full restart releases it). On startup fail2ban reads /var/lib/fail2ban/fail2ban.sqlite3, table bips, and restores bans through the new action.

End result: 1,339 IPs migrated from the dead blacklist into seven live ipsets:

JailBannedipset
sshd19f2b-sshd → 1221
nginx-bad-requests0f2b-bad-requests → 80,443
nginx-hiddenfiles13f2b-hiddenfiles → 80,443
nginx-limit-req-210f2b-limit-req-2 → 80,443
nginx-user-agent1f2b-nginx-ua → 80,443
wp-scan31291f2b-wp-scan → 80,443
recidive (new)5f2b-recidive → 1221,80,443

I added a recidive jail too — standard pattern, catches an IP that f2b banned maxretry=3 times in 7 days, and blocks it for 30 days.

netfilter-persistent save — iptables rules now survive reboot.

Trap 5: bantime = -1 and what to do with it

wp-scan3 — 1,291 banned. With bantime = -1. Permanent.

A year ago I figured "WordPress scanners — no mercy." Reasonable. But now I have 1,291 IPs in a perma-list, accumulated since May 2025. AWS, GCP, DO recycle IPs constantly. A chunk of that list belongs to other owners by now.

I change bantime to 90 days (7,776,000 seconds). Reload jail.

Discovery: reload doesn't recompute existing bans. fail2ban stores endOfBan = startOfBan + bantime at ticket creation. New bans go to 90 days. Old ones stay forever. You can't change policy retroactively without explicit unban.

OK, manual cleanup. Schema of fail2ban.sqlite3:

CREATE TABLE bips(
  ip TEXT NOT NULL,
  jail TEXT NOT NULL,
  timeofban INTEGER NOT NULL,
  bantime INTEGER NOT NULL,
  ...
);

Query: "all IPs banned more than 90 days ago":

SELECT ip FROM bips
WHERE jail='wp-scan3' AND timeofban < strftime('%s','now') - 7776000;

Distribution by age:

Ban ageCount
< 90 days530
90-180 days402
180-365 days359

To clean up — 761 IPs. Script:

ssh hs1 '
cutoff=$(date +%s -d "90 days ago")
sudo sqlite3 /var/lib/fail2ban/fail2ban.sqlite3 \
  "SELECT ip FROM bips WHERE jail=\"wp-scan3\" AND timeofban < $cutoff" |
while read ip; do
  sudo fail2ban-client set wp-scan3 unbanip "$ip" >/dev/null
done
sudo netfilter-persistent save
'

3 minutes, 761 unbanned, 0 errors. 530 fresh ones remain. ipset and iptables synced.

Bonus: why the server configs lived outside git

While digging through this, I noticed an uncomfortable asymmetry: my application code is in git, my infrastructure docs are in git, but the service config files on the servers are only on the servers. Each edit of /etc/fail2ban/jail.d/sshd.local existed in exactly one copy. No history, no diffs, no rollback.

I made a mirror in infrastructure/server-configs/:

server-configs/
├── README.md, deploy.sh, .gitignore
├── hs1/
│   ├── etc/fail2ban/{jail.d/*.local, filter.d/nginx-ua.conf}
│   ├── usr/local/bin/check-hysteria.sh
│   └── root/notify_ssh_login.sh
├── hs2/
│   └── usr/local/bin/fail2ban-telegram.sh
└── shared/
    └── opt/cross-monitor/cross-monitor.sh   # deployed to hs1+rg1+hs2

The structure mirrors absolute paths on the server. deploy.sh hs1 does a dry-run diff against prod, APPLY=1 deploy.sh hs1 copies via scp + sudo cp and backs up existing files as .bak-YYYYMMDD-HHMMSS. Workflow:

$EDITOR server-configs/hs1/etc/fail2ban/jail.d/sshd.local
./deploy.sh hs1                      # diff
APPLY=1 ./deploy.sh hs1              # apply
ssh hs1 'sudo fail2ban-client reload'
git commit -am "fail2ban: tighten sshd"

History exists now. And if I do something weird next time and break prod — git revert.

Lessons

  1. "Currently banned: 19" is not the source of truth. The truth is ipset list f2b-<name> (Number of entries) and iptables -S INPUT | grep f2b- (does a REJECT rule exist). Check both.

  2. <HOST> in fail2ban regex is the first field that matches an IP-like pattern. For nginx main log format you need ^[^ ]+ <HOST> to skip the vhost. Otherwise it's either a DNS lookup on gmasich.ru, or you self-ban your server's public IP.

  3. iptables-ipset-proto4 is broken on modern distros (kernel 3.0+, ipset v6+). Use iptables-ipset-proto6. blocktype=REJECT with port= constrained — so an accidental ban doesn't take the same IP off your whole network.

  4. actionstart_on_demand=true — actionstart runs on the first ban, not on reload. To test: fail2ban-client set <jail> banip 198.51.100.99 (TEST-NET-2 RFC), then check ipset list f2b-<name>.

  5. ignoreip at jail level overrides, not extends the global DEFAULT. When overriding locally, copy all the original networks (CF, LAN, etc.) and add your own.

  6. reload doesn't recompute existing bans when bantime changes. To clean up old ones you need explicit unbanip (via sqlite + loop).

  7. Self-traffic in analytics isn't a single cron, it's your whole fleet. Any host that curls your DNS name shows up in logs with its public IP. Filter by network, not by individual machines.

  8. One UA for all your own scripts (mine — homelab-monitor/1.0) — simplifies filtering and makes self-traffic visually distinct in Grafana. Plus dodges your own fail2ban filter without special whitelist exceptions.

  9. Server configs belong in git. Through server-configs/ + deploy.sh, not "edit in place." History + diff + rollback.

  10. Trust, but verify. Especially your own code from a year ago.

After all this, I came away with a clear sense — the bulk of security work wasn't setting up something new. It was making sure that what was set up long ago actually does what I thought it did. That, perhaps, is the main takeaway.

Share
Discussion

Comments are powered by Giscus + GitHub. Clicking transfers data to GitHub Inc. (USA). No click — no transfer.

Fail2ban archaeology: five traps that quietly piled up dead bans for years · Grigoriy Masich