Recently on the Cup o' Go podcast, the hosts discussed whether AI-generated changelists should be accepted, rejected, or treated differently. The conversation circled around trust, responsibility, and reputation. Should we reject PRs that say "co-authored by Claude"? Should we trust the submitter if they stake their name on it? Should AI involvement matter at all?
The right answer was almost reached - they said: judge the code on its merits.
Correct.
But then they defined "merit" in social terms (belief, reputation, whether the author reviewed it, whether they’re trusted), that’s where we part ways.
Belief is not a quality attribute.
Code is not theology. It is either verifiable or it is not.
If you submit a PR to one of my projects and it does not come with extremely high-quality tests - tests that prove behavior under constraints, edge cases, and failure modes - it will be denied. I don’t care if it was written by a junior developer, a staff engineer, or an AI model trained on the entire internet. No tests, no merge.
This is not anti-AI. It is AI-native engineering.
The Coordination Shift Has Already Happened
We are now living in a world where coding agents can:
- Draft entire features in minutes
- Refactor subsystems instantly
- Generate scaffolding, glue, and repetitive logic without fatigue
- Produce plausible unit tests that look correct at a glance
Execution cost has collapsed.
Verification cost has not.
That inversion is the real story. It’s the same structural shift described in The Coordination Shift - when execution becomes cheap, coordination and governance become the bottleneck.
AI-generated PRs are not the problem. They are the stress test.
When generation is cheap, reputation is no longer a reliable filter. A trusted engineer can produce garbage quickly. An unknown contributor can generate thousands of lines of plausible-looking code in an afternoon.
The intake boundary must change.
Merit must be redefined:
Merit = demonstrable, reproducible correctness under constraints.
Nothing else scales.
Reputation Is a Weak Signal Now
In the podcast discussion, one thread stood out: the idea that the submitter "believes in the code" and stakes their reputation on it. That worked in a world where code was expensive to produce. Reputation was a proxy for effort and judgment. However, in an AI-augmented reality, effort and authorship are decoupled.
Reputation no longer implies:
- Deep understanding of the change
- Exhaustive exploration of edge cases
- Manual implementation effort
- Careful construction of invariants
It may still imply judgment - but that judgment must be visible. The only scalable way to make judgment visible is through executable proof. Tests are not documentation. They are contracts with reality.
AI-Generated Slop Is Not the Real Threat
The real threat is something subtler: plausible slop.
Code that:
- Compiles
- Passes shallow unit tests
- Looks clean in review
- Satisfies the happy path
But does not survive adversarial thinking.
Most AI-generated unit tests today are structurally weak. They confirm what the code already assumes. They rarely:
- Model boundary conditions
- Simulate concurrency hazards
- Explore malformed input
- Validate failure semantics
- Assert invariants across state transitions
If your verification harness is weak, AI does not make you faster. It makes you fragile at scale. And the fragility compounds.
Why 40% Test Code Is Not Excessive
I often see raised eyebrows when I mention that large codebases I’ve worked on contain ~40% tests - unit, integration, end-to-end, not just trivial checks.
In the AI era, that number is not extreme. It is rational.
High verification density gives you:
- Refactorability
- Optionality
- Safe exploration
- Fast iteration without regression fear
It transforms the codebase into a self-verifying system. Without that, you get:
- Review fatigue
- Merge-by-glance culture
- Implicit tribal knowledge
- Silent decay masked by green dashboards
AI does not remove the need for engineering discipline. It amplifies the consequences of lacking it.
This Is Not About Banning AI
Some maintainers respond by banning AI-generated PRs. That is defensive and shortsighted.
AI is not the variable that matters.
Proof is.
The only intake policy that scales in an AI-augmented ecosystem is brutally simple:
- Behavior changes require behavioral proof
- Edge cases must be encoded
- Failure modes must be demonstrated
- Invariants must be executable
If a change increases semantic surface area, the test delta should often exceed the code delta. That is not bureaucracy. That is physics.
Verification Dominates Execution
In The Centaur Manifest, one of the core tenets is: verification dominates execution.
This is not philosophical. It is economic.
When code generation cost trends toward zero:
- The scarcest resource becomes correctness under uncertainty.
- The bottleneck becomes integration and proof.
- The highest leverage activity becomes encoding invariants as tests.
The fastest teams in the AI era will not be those who generate the most code.
They will be those who can:
- Encode constraints automatically
- Run guardrails continuously
- Treat CI as arbiter
- Separate outcome validation from output production
They will look conservative to legacy organizations, and they will actually be moving faster.
Consensus Culture Will Struggle With This
There is also a social layer here.
Demanding strong tests forces clarity. It eliminates ambiguity. It makes competence legible. It removes wiggle room. In consensus-heavy cultures, it is often easier to trust the contributor than to demand executable proof. It feels collaborative. It feels kind.
In an AI-augmented environment, that posture is fatal.
Because AI scales volume.
Without a proof-first stance, you get:
- Mountains of plausible PRs
- Review bottlenecks
- Social pressure to merge
- Diffusion of accountability
If generation is cheap, discipline must increase—not decrease.
The Real Question Is Structural
The podcast framed the question as:
Should we accept AI-generated code?
The structural question is:
How do we redesign verification for near-zero generation cost?
If your answer is:
"Trust the author."
You are still thinking in a pre-AI execution economy.
If your answer is:
"Show me the invariant."
You are adapting.
A Simple Policy for the AI Era
Here is the rule. If the code cannot stand on its own merit, it does not merge.
And merit means:
- Reproducible behavior
- Explicit edge-case coverage
- Clear failure semantics
- Encoded constraints
- Automated proof
Not vibes.
Not faith.
Not reputation.
Just evidence.
Execution is no longer the scarce asset. Judgment and proof are. If you have not internalized that yet, you are not working in an AI-augmented reality. You are still arguing about authorship in a world where authorship is no longer the point.