By PYMNTS | November 3, 2025
|

Autonomous coding agents can now generate, test and debug entire applications but they still falter without human oversight. A new academic work titled, “A Survey of Vibe Coding with Large Language Models” finds that even advanced models capable of building complete apps lose accuracy and efficiency when developers aren’t part of the process. The study measured a 53% decline in code accuracy and a 19% increase in task completion time when human feedback loops were removed. The findings support a growing consensus: AI will not replace software e…
By PYMNTS | November 3, 2025
|

Autonomous coding agents can now generate, test and debug entire applications but they still falter without human oversight. A new academic work titled, “A Survey of Vibe Coding with Large Language Models” finds that even advanced models capable of building complete apps lose accuracy and efficiency when developers aren’t part of the process. The study measured a 53% decline in code accuracy and a 19% increase in task completion time when human feedback loops were removed. The findings support a growing consensus: AI will not replace software engineers anytime soon, but it will redefine how they work.
Autonomy Without Context
The study’s results reveal that coding agents can efficiently generate and refine code within controlled environments, yet their reasoning deteriorates once human guidance is stripped away. The researchers attribute this drop-off to missing context and unclear goal alignment, problems that human developers naturally resolve through judgment and domain experience. “These systems can perform multi-step reasoning, but without structured feedback, they fail to distinguish correctness from plausibility,” the authors wrote.
A Bloomberg Opinion column warned that the “vibe coding revolution” is being overhyped, arguing that many AI-built programs still require heavy rework to meet production standards. The term “vibe coding,” originally popularized by AI researcher Andrej Karpathy, describes the shift toward prompting models in natural language to write and run entire applications without knowing every line of code. It promises faster software creation but also raises new questions about control, versioning and accountability.
In practice, researchers found that agentic models like Claude, Cursor and SWE-Agent performed best when developers reviewed outputs at key checkpoints rather than running fully autonomous sessions. Without those checkpoints, the models produced longer, less maintainable codebases and missed security constraints. The findings align with earlier research on CoAct-1: Computer-Using Agents with Coding as Actions, which similarly concluded that human interaction remains essential for steering multi-agent software systems toward reliable outcomes.
The Hybrid Developer Era
A Wall Street Journal report revealed that Walmart, one of the largest enterprise software buyers globally, is not replacing its developers with AI agents but expanding both. The retailer is creating new “agent developer” roles, engineers who train, supervise, and integrate coding agents into production workflows. Rather than automating humans out of the loop, Walmart’s strategy centers on pairing traditional developers with AI copilots that manage documentation, code refactoring, and test automation.
That same blended approach is reflected in enterprise strategy across finance, logistics, and retail. Human developers are increasingly acting as conductors of agentic systems, structuring context, enforcing validation, and maintaining continuity between business logic and machine output. This is “interactive autonomy,” where AI executes and humans validate. The combination improves speed and scalability while retaining the critical human judgment required for compliance and maintainability.
Advertisement: Scroll to Continue
Vibe coding also can create opportunities for small businesses where they might not have been previously able to afford to pay for an entire development team, as what Justin Jin’s experience was when he launched the AI-powered entertainment app, Giggles.
Still, the researchers warn that this structure must be deliberate. Unstructured collaboration between humans and agents can slow down work rather than accelerate it. Teams in the study that adopted consistent review points and role definitions saw up to 31% higher accuracy than those that let agents operate independently. The takeaway, according to the authors, is that autonomy without scaffolding introduces inefficiency rather than innovation.
As the “Takedown” paper from Stanford highlights, unmonitored AI code can introduce security and compliance vulnerabilities at scale. The lesson, across both research and industry, is that autonomy in AI coding is not a destination but a design choice. True efficiency lies in the feedback architecture that guides agents, embedding human reasoning, ethical oversight and contextual understanding into every iteration.
Vibe coding may indeed spark a new economy, but not through total automation. Its real promise lies in redefining collaboration: developers who manage, teach and correct AI will shape the next era of software creation. In the process, coding may become less about syntax and more about a shared workflow where human oversight remains.