AI Agents Learn From Your Worst Code First

I was reviewing a pull request with an engineer I mentor — he wanted a second set of eyes on the feedback he’d given. The PR added a Java API client generated from an OpenAPI spec alongside AI-written code that integrated it. About eighty percent of the PR was the generated client. It worked, but it didn’t follow the project’s conventions — wrong dependency injection approach, no Lombok, a lot of boilerplate where the rest of the codebase was concise. The generated code was written in the generator’s style. The AI-written code gravitated toward it.

What caught my attention wasn’t the generated code itself — it was that drift in the AI-written portion. The agent had absorbed the generated client’s patterns — verbose service classes, manual instantiation, structural choices that matched the generator’s output rather than the team’s conventions.

I’ve been noticing this more as I work with agents across different codebases. In greenfield projects, I’ve found agents produce clean, consistent output. In brownfield codebases — especially ones carrying auto-generated code or years of accumulated tech debt — the output drifts. From what I’ve seen, the codebase itself influences the agent’s output as much as the instructions do, and the generated or legacy code competes with the conventions the team has set.

The core of the problem is volume. Good code tends to be concise and well-structured. Bad code — generated clients, layers of indirection that outlived their purpose, copy-pasted implementations that diverged over time — tends to be verbose. A clean service implementation might be forty lines. The generated equivalent might be four hundred. The agent doesn’t seem to evaluate which code better represents the team’s intent. It absorbs patterns, and verbose code has ten times the surface area.

It reminds me of watching a new engineer ramp up on a codebase. They read what’s there to understand the conventions. If most of what they read is generated boilerplate or legacy spaghetti, that tends to be what they pattern off — volume creates a sense of “this is how things are done here.” In my experience, agents behave similarly but faster — they read more code and reproduce what they find at scale.

There’s a secondary factor that compounds this today: positional bias. Language models weight what they read at the beginning and end of their context more heavily than what’s in the middle. Good instructions sit at the top, but the agent reads files as it works — and the legacy module or generated client it just opened lands at the end of the context, right before generation. Newer models are reducing this effect, but I still see it show up during long, multi-step sessions where agents accumulate the most context.

That influence is hard to eliminate, but it can be contained. Encapsulation has always been about hiding complexity from the consumer — a class doesn’t need to know how its dependency works, just what it exposes. The same principle applies here, with a new consumer in mind. Scope the generated or legacy code behind a clean interface, move it to a separate package, and the agent doesn’t need to read the source behind it. It sees the interface — concise, consistent with the rest of the codebase — and patterns off that instead. The usual reasons for encapsulation still apply. There’s just one more now: keeping the agent focused on code that reflects the team’s conventions rather than the generator’s.

For the case that started this — an auto-generated API client pulling the agent’s style in the wrong direction — there’s a simpler option: use AI to generate the client instead. Not the whole spec, just the endpoints needed, written to match the codebase’s conventions. The tradeoff is losing the guaranteed spec compliance of a deterministic generator. The gain is code that fits the project’s style and doesn’t compete with the conventions the rest of the codebase follows.