Vibe Coding in Production: When AI-Generated Code Meets Real-World Complexity



Vibe Coding in Production: When AI-Generated Code Meets Real-World Complexity

“Vibe coding” — the term Andrej Karpathy coined for letting LLMs write most of your code while you steer at a high level — has moved from Twitter discourse to genuine engineering debate. In 2026, some companies are shipping meaningful features this way. Others are drowning in AI-generated technical debt. What actually separates the two?

Developer working with AI Photo by John Schnobrich on Unsplash

What “Vibe Coding” Actually Means in Practice

The term gets misused constantly. Let’s be precise. Vibe coding, as a practical approach, means:

  1. Describing intent at a high level rather than thinking through implementation details first
  2. Iterating on generated output rather than writing from scratch
  3. Trusting the AI to handle boilerplate while focusing your attention on domain logic
  4. Accepting imperfect output and shaping it rather than demanding perfect generation

This is fundamentally different from “AI autocomplete” (the Copilot model) and also different from “AI pair programming” (back-and-forth collaborative coding). Vibe coding is more like being a tech lead who delegates implementation to junior engineers — you define the architecture and acceptance criteria, then review and shape what comes back.

The Production Reality Gap

Here’s what nobody talks about in the hype pieces: LLMs are extraordinarily good at code that looks correct and sounds reasonable. They’re much weaker at:

1. Understanding Your Specific System

// You ask: "Add pagination to the user list endpoint"
// AI generates this reasonable-looking code:
async function getUsers(page: number, limit: number) {
  return db.users.findMany({
    skip: (page - 1) * limit,
    take: limit,
  });
}

What the AI doesn’t know: your users table has 50 million rows, you have a specific indexing strategy, and naive offset pagination will kill your database at page 1000. The code is textbook-correct and production-wrong.

2. Distributed System Edge Cases

Ask an LLM to implement a cache invalidation strategy and it will give you something that works perfectly for a single-instance deployment. Bring up concurrent invalidation across 20 instances and it gets shaky. Bring up partial failures during invalidation and many models will hallucinate solutions that look plausible but have subtle race conditions.

3. Security Properties That Require Domain Knowledge

# AI-generated password reset flow — looks fine:
def reset_password(token: str, new_password: str):
    user = User.query.filter_by(reset_token=token).first()
    if not user:
        return {"error": "Invalid token"}
    user.password = hash_password(new_password)
    user.reset_token = None
    db.session.commit()
    return {"success": True}

What’s missing: token expiry check, constant-time comparison to prevent timing attacks, rate limiting, invalidating all existing sessions after reset. Each missing piece is a security vulnerability, and none of them are obvious from “generate a password reset function.”

Where Vibe Coding Genuinely Shines

This isn’t a hit piece. There are domains where AI-generated code is genuinely transformative:

CLI Tools and Scripts

For internal tooling, scripts, and one-off automations, vibe coding is incredible. Speed matters, perfect security doesn’t, and the blast radius of a bug is small.

# Describing the problem to Claude:
"I need a script that watches a directory for new JSON files, 
validates them against a schema, and moves valid files to /processed 
and invalid files to /errors with a log of what failed validation"

A skilled engineer might spend 30–45 minutes on this. A good vibe coding session takes 5 minutes and produces something equivalent or better.

Frontend UI Components

The pattern of “generate a React component for X” is highly reliable because:

  • The blast radius is visible and bounded
  • Edge cases are observable in the browser
  • Testing is intuitive (does it look right?)
  • The LLM training data for React patterns is enormous

Data Transformation and ETL

Converting data formats, normalizing schemas, building transformation pipelines — these are high-boilerplate, low-ambiguity tasks where LLMs excel.

Test Generation

Perhaps the highest-ROI use case: give AI your implementation and ask it to write tests. The AI knows all the edge cases for the patterns it recognizes, and test code has a low cost of being “overly cautious.”

A Framework for Responsible Production Vibe Coding

After seeing what works and what doesn’t, here’s a practical framework:

The “Blast Radius” Rule

Before accepting AI-generated code, ask: if this code has a bug, how bad is the worst case?

  • Low blast radius (internal tool, read-only operation, reversible action): ship it with basic review
  • Medium blast radius (user-facing feature, writes to database): code review + integration tests
  • High blast radius (authentication, payments, security-sensitive): treat AI code as a draft, rewrite with deep understanding

The “Context Completeness” Check

AI code quality correlates directly with how much context you provide. Before generating, ensure you’ve given:

Context template:
- The existing system: [relevant code snippets, architecture]
- The constraints: [performance requirements, security needs]
- The failure modes: [what should happen when X goes wrong]
- The success criteria: [specific, measurable outcomes]

Vague prompt → vague code. Specific prompt → specific, reviewable code.

The “Explain It Back” Test

After generating code you’re going to ship, ask the AI to explain what edge cases it didn’t handle and why. This surfaces the “unknown unknowns”:

“What scenarios does this implementation handle poorly? What would you add if this needed to handle 10x the expected load? What security considerations are outside scope here?”

The answers will either be reassuring (it’s thought about the right things) or alarming (it’s glossed over critical concerns).

The Honest Production Numbers

From teams I’ve talked to in 2026 who are transparent about their AI usage:

Use CaseAI ContributionConfidence Level
Boilerplate/scaffolding80–90%High
New CRUD endpoints60–70%Medium-High
Business logic30–40%Medium
System design15–20%Low
Security-critical code<10%Low

“Contribution” here means code that shipped largely as generated vs. heavily modified or rewritten.

The Skills That Matter More Now

Vibe coding doesn’t reduce the value of engineering expertise — it shifts where expertise matters:

More valuable:

  • System thinking and architecture
  • Knowing what questions to ask
  • Security and threat modeling intuition
  • Code review and verification skills
  • Understanding failure modes and edge cases

Less valuable (relatively):

  • Memorizing syntax and APIs
  • Writing boilerplate from scratch
  • Knowing the “standard” implementation of common patterns

The engineers who are thriving in 2026’s AI-assisted environment are the ones with deep mental models of how systems fail, not just how they’re built.

Conclusion

Vibe coding is real, useful, and here to stay. The engineers dismissing it as a toy are falling behind. The engineers treating AI output as production-ready without review are building future disasters.

The sweet spot is treating AI-generated code the way you’d treat code from a talented but inexperienced engineer: give them a clear problem, review their solution critically, ask good questions, and trust but verify.

The “vibe” isn’t about being casual with quality. It’s about where you direct your attention — and in 2026, the best engineers are directing it at the problems that matter.


Further reading: Andrej Karpathy’s original “vibe coding” thread, Simon Willison’s thoughts on LLM-assisted development

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)