Skip to content
Back to Blog
· 7 min read

What We Actually Find When We Audit AI-Generated Codebases

AI tools can build an app in a weekend. But what's actually in the code? We've audited enough vibe-coded projects to see the patterns — here's what breaks.

ai-code-quality technical-audit vibe-coding code-review engineering

We’ve been getting a new kind of client lately.

They’re founders, product leads, sometimes CTOs who inherited something. They shipped fast — really fast — using Cursor, Copilot, Claude, or some combination. The app works. Users are on it. Revenue is coming in. And now they have a problem: nobody on their team actually understands the codebase.

They didn’t write it. The AI did. And they need someone who reads code — not just prompts it — to tell them what they’ve got.

That’s us. Here’s what we keep finding.


1. Hallucinated Logic That Passes Every Demo

This is the most dangerous one. The code runs. The tests pass (if there are tests). The feature looks correct in the UI. But the underlying logic doesn’t do what anyone thinks it does.

We audited a fintech app where the interest calculation function produced correct-looking numbers for the demo dataset. When we traced the actual math, it was wrong — not by a lot, but enough to compound into real liability over thousands of transactions. The AI had generated a formula that looked like compound interest but wasn’t.

Nobody caught it because nobody read the function. They tested the output, saw reasonable numbers, and shipped it.

What to look for: Any function doing math, validation, or business-critical logic. Don’t test the output — read the implementation.


2. The Same Thing Built Three Different Ways

AI tools don’t have memory across sessions the way your engineering team does. Ask it to build user authentication on Monday, and it uses one pattern. Ask it to build role-based access on Wednesday, and it invents a completely different approach. Ask it to add API key auth on Friday, and you get a third.

We regularly find codebases with three or four different patterns for the same concern: error handling done differently in every file, state management split across multiple approaches, API calls wrapped in different abstractions depending on when they were generated.

The code works. But maintaining it is a nightmare because there’s no consistency. A human engineer inheriting this codebase has to learn three systems instead of one.

What to look for: Pick any cross-cutting concern (error handling, auth, data fetching) and trace it across the codebase. If you find more than one pattern, the AI built in silos.


3. Phantom Dependencies

AI tools are trained on millions of packages. They’ll import libraries that solve your problem elegantly — libraries that are unmaintained, have known vulnerabilities, or do the same thing as another library already in your project.

We’ve seen package.json files with 80+ dependencies where 30 could be removed. Duplicate utilities doing the same job. Packages pulled in for a single function that could be written in five lines. Abandoned packages with open CVEs.

Every unnecessary dependency is attack surface, bundle size, and one more thing that can break on update.

What to look for: Run npm audit or equivalent. Then check your dependency list against what’s actually imported. You’ll be surprised.


4. Error Handling That Swallows Everything

AI-generated code loves try/catch. It wraps everything. And in the catch block? Usually console.log(error) or, worse, nothing at all.

This means your application fails silently. Users see a blank screen or stale data. Your monitoring shows green. The error happened, was caught, was logged to a console nobody reads, and life went on — until it didn’t.

We audited a logistics platform where shipment status updates were silently failing for a subset of carriers. The catch block logged the error and returned the last known status. Users saw “In Transit” for packages that had been delivered three days ago. The dashboard showed 100% uptime.

What to look for: Search your codebase for empty catch blocks and catch (e) { console.log. Every one of these is a place where your app is lying to you.


5. Security Holes Hidden Behind Working Features

AI tools optimize for “does it work?” not “is it secure?” We consistently find:

  • API keys hardcoded in frontend code
  • SQL queries built with string concatenation
  • Authentication checks missing on API routes that “only the frontend calls”
  • User input rendered without sanitization
  • CORS set to * because the AI couldn’t figure out the right origin

These aren’t exotic vulnerabilities. They’re OWASP Top 10 basics. But the AI doesn’t flag them because you asked it to build a feature, not to secure one.

What to look for: Run a basic security scan (OWASP ZAP, Snyk). Then manually check every API route for auth middleware. The gaps will be obvious.


6. No Architecture, Just Files

This might be the most telling pattern. Human-built codebases have architecture — intentional decisions about where things go, how layers communicate, what depends on what. AI-generated codebases have files.

Components that fetch data, transform it, handle errors, and render UI — all in one place. Business logic scattered across API routes with no shared service layer. Database queries duplicated in twelve different endpoints because each was generated independently.

The app works today. But the moment you need to change how a core entity behaves, you’re editing thirty files instead of one.

What to look for: Try to answer “where does the business logic for [core entity] live?” If the answer is “everywhere,” you have files, not architecture.


What This Means For You

None of this means AI tools are bad. We use them ourselves — extensively. The difference is that we understand what the AI produces because we’ve spent years reading, writing, and debugging code the hard way. We know what good architecture looks like because we’ve built systems that are still in production a decade later.

AI is a power tool. But a power tool in the hands of someone who doesn’t understand the material produces a lot of output and not much craftsmanship.

If you’ve shipped something with AI and it’s working — great. That’s further than most people get. But before you scale it, hire a team around it, or raise money on it, you need someone who actually reads code to tell you what’s in there.


What We Do About It

Our Technical Audit now includes a dedicated AI-generated code assessment — hallucinated logic, phantom dependencies, security gaps, and architectural coherence. Two weeks, and you’ll know exactly what you have.

For codebases already in production, our Maintenance & Support service includes ongoing AI code remediation — systematically replacing brittle, AI-generated patterns with code a human engineer can maintain.

And if you’re not sure whether to keep building on what you have or start fresh, that’s exactly what Strategy & Discovery is for.

Talk to us. We’ll be straight with you about what you’ve got.