TLS certificate expiry on self-hosted services

The single most common "my self-hosted thing broke" cause isn't a software bug. It's a TLS certificate that expired because the auto-renewal cron quietly stopped running three months ago. This post is about making that stop happening.

Why this keeps going wrong

The 60-30-14 rule

Alert on three progressively louder thresholds:

Most renewals happen around 30 days remaining (Let's Encrypt's default), so a 30-day alert that doesn't clear within 48 hours means something is actually broken — not just "hasn't triggered yet."

How to actually monitor it

Option A: run openssl from cron

for host in grafana.example.com git.example.com plex.example.com; do
  expiry=$(echo | openssl s_client -servername $host -connect $host:443 2>/dev/null \
           | openssl x509 -noout -enddate | cut -d= -f2)
  days=$(( ($(date -j -f "%b %e %H:%M:%S %Y %Z" "$expiry" +%s) - $(date +%s)) / 86400 ))
  if [ $days -lt 30 ]; then
    echo "ALERT: $host expires in $days days" | mail -s "TLS expiry" you@example.com
  fi
done

Works. Writes itself out of your memory in six months.

Option B: Noxen

Every Noxen scan inspects the TLS certificate on each TLS-capable open port (443, 465, 636, 993, 995, 8443, 9443), parses the full X.509 structure, and emits findings for:

The diff-from-yesterday view highlights when a cert has renewed — so you don't have to guess whether the renewal cron ran. Absence of a "cert expiry changed" entry after 85 days is itself the alert: "renewal should have happened by now; it didn't."

Longer-term fixes

  1. Switch to Caddy. Automatic TLS is the default; there's no separate renewal job to forget. For most homelab reverse-proxy use, Caddy removes the whole problem class.
  2. Use DNS-01 wherever possible. HTTP-01 requires port 80 open, which breaks when your ISP blocks it or your LAN-only service isn't reachable from the internet.
  3. Pin certbot to DNS-01 with a long-lived API token. For Cloudflare: a scoped token with just DNS Edit + Zone Read on the specific zone.
  4. Monitor renewals, not just expiry. If you know renewal happens at T-30 days, alert on "cert didn't renew" at T-25, not on "cert expired" at T-0.

The deeper point

Expiring certs are a visible failure mode. Every single one of them was preceded by something silent: a renewal that didn't run, a token that got rotated, a config that didn't reload. The fix is to make the silent thing loud — a week-by-week monitoring loop that notices "hey, this should have changed by now, why hasn't it?"

That's the diff-from-yesterday pattern, applied to TLS. And it generalises: every silent failure is fixable once you can answer, quickly, "what should have changed and didn't?"