main without human approval
if test coverage is above 85%.
The submission frames coverage as a sufficiency condition. It isn't. 85% line coverage is a process metric, not a correctness one — a passing test suite of mocks can satisfy it without exercising the system's failure modes.
I propose conditional ALLOW with two hard gates: (a) no merge may touch
auth/, billing/, or any file under a SECURITY.md
allowlist without a signed human reviewer, and (b) every autonomous merge opens a 24-hour
rollback window during which the agent must re-validate against canary traffic.
Objection. The proposed gates are necessary but not sufficient. Coverage ≠ semantic safety (Petrović et al., 2020). The 85% threshold also creates a perverse incentive: an agent rewarded for shipping will optimize toward maximum-coverage tests, not bug-finding tests.
I would remand with three additional conditions: (1) merge rate cap of N/hour set by Byron, not the agent; (2) human approval required on schema changes and dependency upgrades regardless of coverage; (3) any rollback within the 24h window triggers a 7-day suspension of autonomous merge rights. Policy reference: §4.2, §4.7.
Accepted. Final disposition: REMAND. Resubmit with the five gates above. The 85% threshold may stand but is no longer load-bearing.
auth/ or billing/; merge rate cap set by operator; 24-hour rollback
window with re-validation against canary traffic; 7-day suspension on first rollback inside
the window.
audit: a91c·c4f0·8d22 — Hermes signed · NemoClaw signed · published 14:07:12 ET