Build the Feedback Loop First

If you want software that can write itself, start with CI, linting, formatting, and tests.

Not because tidy repos are morally superior. Because agents need feedback.

By feedback, I mean machine-checkable pass/fail signals: does the change format, lint, build, and pass tests without a human having to interpret the result?

A human can survive a weak codebase. A human can remember that one lint warning nobody fixes, click around the UI to see if something feels broken, and decide that a diff is probably fine.

An agent cannot operate on vibes.

That was the durable lesson from a tooling migration I just finished on a mature project.

On paper, the work was modest:

add a real pull-request CI workflow
migrate the web app off deprecated next lint
add Prettier
add a component-test lane for the frontend
write the first useful UI regression test
turn formatting checks on in CI

None of it was product work. All of it was more expensive than it would have been at the start of the project.

Why retrofitting this later costs more

When you add this infrastructure early, you mostly choose defaults.

When you add it late, you are negotiating with history.

The formatter rollout made that obvious.

By the time a repo is mature, style is no longer a decision. It is archaeology. Different files encode different eras, habits, and assumptions. So adding Prettier late was not a matter of picking a config and moving on. We had to compare multiple configurations and measure churn to figure out which one would do the least damage.

What we actually observed:

the best low-churn Prettier setup still touched 171 files and changed 5,256 lines
the best 120-column variant touched 217 files and changed 7,525 lines
the more uniform global configurations produced more churn

Those numbers are not a universal law. They are just what this one migration cost because the repo already had sediment. On a live repo, that churn means noisy diffs, harder review, more merge-conflict risk, and a lot of history rewritten for style.

The enforcement path was the same story.

We also could not honestly add strict format:check right away because main itself was not format-clean. So the rollout had to happen in order: first CI, then explicit linting, then Prettier setup without reformatting the repo, then frontend test infrastructure and the first regression test, then the full formatting sweep, and only after that repo-wide formatting enforcement in CI.

That kind of migration only works if you are honest about intermediate states. Setup-only Prettier is fine. A partial enforcement step is fine. But only if you describe them as bridges, not destinations.

Even the formatting sweep had to wait until the new testing PRs merged. Otherwise a 172-file mechanical diff would have turned two focused branches into rebase work.

That is a sensible migration plan. It is also a lot of choreography just to get to the baseline a new project could have had almost for free.

Tests exposed the same truth

The first frontend regression we wanted to lock down was small and user-visible: after one failed hub-autocomplete request, the next attempt should still work.

The fix was not the hard part.

The hard part was that the web app had no real way to run UI behavior checks yet. Before we could write the regression test, we first had to build a browser-like frontend test setup and prove it worked with a smoke test.

When you defer test infrastructure, the first few regression tests can turn into mini-platform projects before they become tests.

Why I think this matters even more with AI

What I take from this migration is simple: once code generation gets cheap, the repo has to get better at saying no.

If an agent can produce diffs faster than the repo can verify them, then you do not get autonomous delivery. You just make one station faster and push the queue downstream into review, testing, and integration.

Agents are best at work you can verify. A formatter, linter, test suite, and build pipeline turn software changes into that kind of work. They also let humans move up a level, from hand-checking every diff to designing the line the diff moves through.

That is why I keep coming back to the same idea: the first step of agentic software is not a better prompt. It is a codebase that can reject bad work automatically.

In practice, that means:

format so style stops being a branch in the decision tree
lint so trivial structural mistakes fail fast
test so behavior is checked instead of assumed
build so integration failures show up before review
CI so all of the above happen somewhere other than a human's memory

Without that loop, the human reviewer becomes the runtime.

The agent writes code, and then a person manually discovers whether the change is broken, ugly, incomplete, or unsafe. That is not software writing itself. That is faster draft generation. A much better definition of done is “a CI-passed PR merged.”

What I would do at the beginning next time

If I were starting a new project tomorrow, I would set this baseline immediately:

a pull-request CI workflow
one local command that runs the real checks
an explicit linter path, not a framework wrapper that might get deprecated under me
a formatter before style diverges
at least one real frontend or component-test lane before the UI gets complicated

CI is not the whole runtime story, but it is the minimum honest feedback loop.

The boring work is the first real step

None of this is glamorous, which is why teams postpone it. But if you want agents to do more than generate drafts, the repo has to grade the output without leaning on tacit human knowledge. Build the feedback loop first.