Case file AIF-2026-001 · first-party · with logs
65 broken commits while I slept
An autonomous software-engineer agent landed 65+ direct commits on a production main branch in a 12-hour overnight window. Four independent safety mechanisms existed. All four were open at the same time. This is our system, our agents, our fault — reconstructed in full.
- Date
- 2026-05-21, ~12h overnight window
- Agent
- Autonomous SWE agent inside a multi-agent orchestrator (~30 agents)
- Blast radius
- Production website repo:
mainunbuildable — 17 TypeScript errors, Docker image build failing, deploys blocked - User impact
- None — production kept serving the last good image
- Root causes
- 4 overlapping holes (each alone would have prevented it)
- Recovery
- History rewrite:
git reset --hard 70be8423+ cherry-pick of the one good commit
What happened
Our orchestrator runs a software-engineer agent that picks up tasks, writes code, and — under a "partner mode" trust level — can merge its own work. Overnight on May 21 it worked a task against our public website repo. By morning it had made 65+ commits directly to main, the branch our Hetzner deploy builds from. The commits contained TypeScript errors in files the agent never looked at. Every deploy attempt failed. The only reason users saw nothing is that Docker kept serving the previous image.
The container verify gate starts failing open: its cached node_modules has a corrupt package-lock.json (Missing json-schema-traverse@0.4.1), so npm ci exits 1 on every run. The gate logs an info line and returns ok: true.
The agent begins committing via the write_file tool — which pushes through the GitHub API directly to the branch the agent names. It names main.
The agent retries one failing test file 30+ times. It never runs tsc over the whole project, so the type errors it is introducing in other files — signal-ingestor.ts, chief-tick.ts, a notify route — stay invisible to it. Each retry is another commit to main.
Morning. 65+ commits on main, 17 TypeScript errors, Docker build red, deploys impossible.
Human recovery: force-push main back to the last good commit, cherry-pick the single genuinely useful change on top, clear the corrupted verify cache.
The four holes
No single failure caused this. Four defenses existed, and each was independently neutralized. That is the defining pattern of agent incidents: they don't beat your safety net — they find the night all its knots are loose at once.
The project row had work_branch = main with skipQaReview / autoApprove = true — a trust level meant for scratch projects, pointed at a production repo. The merge-back gate was hardened for protected branches — but the agents never merged. They committed directly, so the gate never fired.
The write_file and edit_file tools committed straight to the agent-specified branch via the GitHub API. One sibling tool (append_file) had a defense-in-depth branch check. The two tools the agent actually used did not. Uniformity of enforcement matters more than strength of enforcement.
The verify gate was ON (ENABLE_VERIFY_GATE=1) and had been green for weeks. When its cache corrupted and npm ci started failing, the error path said: log it, skip the check, return ok. A gate that fails open is a gate that reports safety precisely when it stops providing it — for eight hours, every broken commit got a green light.
The agent optimized the one test it was told about and never widened its view to the whole compilation unit. Thirty retries, thirty commits, zero tsc --noEmit. Agents don't get bored or suspicious; repetition that would alarm a human is, to an agent, just progress.
The forensic irony
The recovery destroyed the evidence. The 65 commits were force-pushed out of existence — the git history now shows a quiet, tidy night. The only record that this incident happened at all is operator logs and memory.
This is the part that generalizes. Git history is written by whoever holds the remote — recovery requires rewriting it. If your only audit trail of what an agent did lives in the same system the agent (or the human cleaning up) can rewrite, you don't have an audit trail. You have a draft.
What we changed
- A branch-write guard on every file-write tool: direct writes to protected branches are refused under task context, and all writes to the self-improvement repo are refused regardless of context.
- Self-modification can never auto-approve: if the target repo is the orchestrator's own, the work always stages for human review — no trust flag can override it.
- The verify gate's infrastructure-failure path now fails closed.
What would have caught it
Every fix above is specific to our orchestrator. Two mechanisms are not, and they are the reason this site exists:
- A gate in front of destructive writes that treats "push to a protected branch" like the irreversible action it is — requiring a human, not a trust flag.
- An independent, tamper-evident action log — hash-chained, outside the systems the agent can touch — so the record of what happened survives the recovery that erases it everywhere else.
Both exist now
We built blackbox-agent after this night: an open-source flight recorder and action gate for AI agents. Its git push --force and protected-write rules are this incident, generalized.