How to Triage Vulnerabilities in Vibe-Coded Projects: Severity, Exploitability, Impact

Mario Anderson
26 December 2025

When you let an AI write your code, you’re not just speeding up development-you’re inviting in a new kind of risk. Vibe coding, where engineers give high-level prompts to LLMs and let them generate entire modules, is fast. But it’s also vulnerable. Security scanners are finding flaws in AI-generated code at alarming rates-sometimes more than 45% of it fails basic security checks. You can’t just run a scanner and call it a day. You need a system to sort through the noise, figure out what’s truly dangerous, and fix it before it ships. This isn’t about finding every bug. It’s about surviving the chaos.

Why Vibe Coding Breaks Traditional Security Models

Traditional vulnerability triage assumes code is written by humans who follow patterns, make predictable mistakes, and can be trained. Vibe-coded projects don’t work that way. LLMs don’t understand context. They don’t know what a secret key is supposed to be protected from. They’ve seen millions of lines of insecure code in training data-and they’ll replicate it if the prompt doesn’t explicitly forbid it.

A study from arXiv in December 2025 tested 200 real-world security tasks across 77 CWE types. The results? Frontier LLMs failed over 80% of security tests, even when they passed more than half the functional ones. That’s not a glitch. It’s a design flaw. The model thinks it’s doing its job because the app loads, the form submits, the API returns data. But it didn’t check if the user should’ve been allowed to access that data in the first place.

This creates a dangerous illusion: the app works, so it must be safe. But in reality, you’re deploying code with hardcoded API keys, unvalidated inputs, and broken authentication-all invisible until someone exploits them.

Severity: It’s Not Just CVSS Scores

Most teams still use CVSS scores to rank severity. That’s fine for traditional code. But in vibe coding, severity isn’t just about the flaw-it’s about how far it spreads.

Escape.tech found that 62% of the 2,000+ vulnerabilities they uncovered in vibe-coded apps involved exposed secrets or PII. Another 28% were critical access control failures. These aren’t isolated bugs. They’re systemic. One prompt like “build a user profile endpoint” can generate five files: a model, a controller, a middleware, a test, and a config. If the LLM misses authentication in one, it’s likely missed it in all of them. That’s not a single vulnerability. That’s a chain reaction.

So when you rate severity, ask:

Is this secret hardcoded in a config file that gets pushed to GitHub?
Does this endpoint accept any user ID without checking ownership?
Is this vulnerability present in every similar module the AI generated?

If the answer is yes to any of those, treat it as critical-even if the CVSS score is only 7.5. The real risk isn’t the flaw. It’s the scale.

Exploitability: How Easy Is It to Abuse?

Exploitability in vibe-coded apps isn’t about advanced hacking skills. It’s about how much effort it takes to find the flaw.

Vidoc Security’s taxonomy breaks it down cleanly:

Hardcoded secrets-100% exploitable with zero effort. Just search the repo. Done.
Broken authorization-87% exploitable with moderate effort. Change a user ID in the URL. Boom, access.
Insecure deserialization-63% exploitable with advanced effort. Requires crafting payloads, but still common.

The scary part? These aren’t edge cases. They’re the norm. A marketing agency using vibe coding to build client websites had a 45% failure rate on OWASP Top 10 tests-every single one traced back to AI-generated code. One team accidentally exposed customer addresses because the LLM generated a “view profile” endpoint that didn’t validate ownership. No login check. No token. Just a direct URL.

That’s why exploitability in vibe coding is less about technical complexity and more about predictability. If the AI was trained on code with poor security, it will replicate it unless you force it to do otherwise.

Three-layer security defense: scanner, AI reviewing code, and human blocking merge with dramatic comic book style.

Impact: The Domino Effect

Impact isn’t just “how bad is this?” It’s “how far does this break?”

A single hardcoded API key in a vibe-coded microservice can lead to:

Full access to cloud storage
Compromise of third-party APIs
Exposure of customer data across multiple systems

Nucamp’s advice is simple: treat every line of AI-generated code like a junior developer’s first draft. You wouldn’t ship code from a new hire without review. Don’t ship code from an AI without the same scrutiny.

Real-world impact isn’t theoretical. Databricks found that 45% of AI-generated code introduces OWASP Top 10 flaws. That means if you’re using AI to build 10 features a week, you’re likely shipping 4-5 vulnerabilities per week. Multiply that across teams, and you’ve got a security time bomb.

The key is to measure impact by potential blast radius. Does this flaw affect internal tools only? Or does it touch customer data, payment systems, or public APIs? If it’s the latter, it’s critical-even if the exploit is “moderate.”

The Triaging Framework: Three Levels of Defense

You can’t fix what you don’t prioritize. Aikido.dev’s three-level triaging system works because it forces structure:

Level 1: Automated Scanning (The Floor)

Start with tools that catch the obvious. Use SAST (Static Application Security Testing) like SonarQube, which detects 85% of code quality and security issues. Pair it with DAST tools like OWASP ZAP to test running apps. These tools find the low-hanging fruit: missing headers, unencrypted endpoints, known vulnerable libraries.

But don’t stop there. Dependency scanning is non-negotiable. Tools that monitor SBOMs (Software Bill of Materials) catch drift with 99.2% accuracy. If your AI-generated code pulls in a library with a known CVE, you need to know before it goes live.

Level 2: AI Self-Review (The Filter)

This is where vibe coding gets unique. Instead of just scanning the output, feed it back into the LLM with a security prompt.

Example prompt: “Review this code for hardcoded secrets, missing authentication, and unvalidated inputs. List all vulnerabilities and suggest fixes.”

Databricks tested this. After adding a self-reflective review step, vulnerability rates dropped by 57% in the PurpleLlama benchmark. Why? Because the AI, when forced to think about security, starts spotting patterns it previously ignored.

It’s not perfect. The arXiv study showed that when asked to fix vulnerabilities, LLMs introduced new ones in 68% of cases. But used as a filter-not a replacement-it’s powerful.

Level 3: Organizational Guardrails (The Wall)

Automate the rules. Secret scanning tools like GitGuardian monitor code, wikis, and even Slack messages for exposed keys. Set up automatic revocation SLAs: if a secret is found, it’s rotated within 15 minutes.

Mandate CI/CD gates. No code passes unless it clears SAST, DAST, and secret scans. Make this non-negotiable.

Enterprise adoption is already shifting. SecurityWeek reports 78% of companies using vibe coding now enforce mandatory security steps in their pipelines. The ones that don’t are the ones getting breached.

Domino effect of one hardcoded secret causing cloud breach, data leak, and corporate collapse in comic style.

The New Triaging Model: Modified DREAD

Traditional DREAD (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) doesn’t fit vibe coding. Why? Because exposure is the biggest risk.

ReversingLabs’ team adjusted it:

Exposure (40%)-How widely is this flaw spread? Is it in one file or 20?
Damage (30%)-What’s the worst-case outcome? Data leak? System takeover?
Reproducibility (15%)-Can you reliably trigger it?
Exploitability (10%)-How hard is it to exploit?
Affected Users (5%)-Who’s impacted?

This model prioritizes flaws that are widespread, not just severe. A secret leaked in 10 different services? That’s a Level 1 priority-even if it’s “just” an API key.

Tools like Vidoc Security Lab now integrate this scoring directly into development pipelines, auto-tagging vulnerabilities that match the 77 CWE patterns from the SusVibes benchmark.

Why Humans Still Win

No tool, no matter how smart, replaces human judgment.

Google Cloud’s 2025 Security Command Center cut false positives by 42% by learning AI-generated code patterns. IBM’s research showed combining automated scans with LLM reflection caught 91% of vulnerabilities that either method missed alone.

But here’s the catch: LLMs still don’t understand context. They don’t know your business rules. They don’t know that this endpoint handles payments, or that this data is PII under GDPR.

That’s why the final step in triaging is always human review. Look at the code. Ask: “Why was this written this way?” “What’s the intended flow?” “What happens if someone sends a malformed request?”

The AI can find the holes. But only you know what’s at stake.

What to Do Tomorrow

You don’t need to overhaul your process. Start small:

Run SonarQube and OWASP ZAP on your latest vibe-coded feature.
Use a secret scanner on your repo. Look for API keys, tokens, passwords.
Feed the AI’s output back into the model with a security review prompt.
Set a CI/CD gate: no merge unless all scans pass.
Train your team: every line of AI code is a draft. Review it like one.

The goal isn’t perfect code. It’s predictable risk. Vibe coding isn’t going away. But if you treat it like regular code, you’re setting yourself up for failure. Triaging vulnerabilities here isn’t about finding every bug. It’s about building a system that catches the ones that matter-before they catch you.

10 Comments

Raji viji
December 28, 2025 AT 01:21

Bro, AI-generated code is just a fancy way of saying 'I didn't write this, so it's not my problem.' 45% failure rate? That's not a bug, that's a feature of lazy engineering. I've seen teams ship LLM output straight to prod and act shocked when their auth system lets anyone delete customer data. Wake up. You're not automating security-you're outsourcing negligence.
Rajashree Iyer
December 29, 2025 AT 16:29

It’s like giving a toddler a flamethrower and calling it 'creative expression.' The AI doesn’t know the difference between a secret key and a birthday cake-it just mimics what it’s seen. We’re not building software anymore. We’re curating digital nightmares and calling it innovation. The real tragedy? We’re proud of it.
Parth Haz
December 30, 2025 AT 16:42

While the concerns raised are valid, I believe we should approach this with measured caution rather than alarmism. AI-assisted development is here to stay, and the key lies in integrating robust, scalable security practices-not rejecting the technology. A layered defense, as outlined in the article, is not only feasible but already proving effective in enterprise environments.
Vishal Bharadwaj
December 30, 2025 AT 21:39

lol 45% failure rate? That's low. I work with teams that hit 80%+ and they still think they're 'innovating.' Also, CVSS is garbage. Who even uses that anymore? And 'self-review' by the same AI? That's like asking a thief to audit his own safe. Also, your grammar is weird. Why so many hyphens? Like, wtf. Also, you missed that DREAD is dead because everyone uses CVSS anyway. You're all wrong.
anoushka singh
January 1, 2026 AT 08:49

Wait, so you're saying I can't just paste the AI's output into GitHub and call it a day? But I already got the client's approval... and the demo worked? Isn't that enough? 😅
Jitendra Singh
January 2, 2026 AT 13:03

I’ve seen this play out in my team. We started using LLMs for boilerplate, then got lazy. One day, a hardcoded AWS key popped up in a PR-no one noticed because it ‘worked.’ We implemented the three-level framework from the post. It’s not perfect, but now we catch 90% of the dumb stuff before it ships. It’s not about trusting the AI. It’s about building guardrails so the AI can’t wreck us.
Madhuri Pujari
January 3, 2026 AT 21:16

Oh, so now we're 'triaging' AI-generated code like it's some kind of medical emergency? How poetic. Let me guess-you also believe in 'security by prompt engineering.' Oh, wait-you just fed the AI a paragraph and expected it to suddenly become a security expert? That's like asking a toaster to diagnose cancer. The real vulnerability isn't in the code-it's in the delusion that AI can replace critical thinking. And you call this 'innovation'? Please. We're just automating our incompetence.
Sandeepan Gupta
January 4, 2026 AT 09:49

Let me break this down simply: AI writes code like a new grad who read a tutorial once. You wouldn't ship a new hire's first PR without review. Why treat AI any differently? Start with SonarQube. Run secret scans daily. Add a self-review prompt. Enforce CI gates. These aren't fancy tricks-they're hygiene. Do them consistently, and you'll avoid 90% of the disasters. This isn't rocket science. It's responsibility.
Tarun nahata
January 5, 2026 AT 21:37

Guys, this isn't doom and gloom-it's a chance to level up! AI is the new junior dev, and we're the seniors who get to mentor it. Instead of panicking, let’s build better prompts, tighter CI/CD, and stronger reviews. Every vulnerability found before production is a win. Every team that adopts guardrails? They’re the ones who’ll lead the next decade. This isn’t the end of coding-it’s the beginning of smarter coding. Let’s rise to it!
Aryan Jain
January 7, 2026 AT 18:18

They don't want you to know this but AI code is being used to track your movements, steal your data, and sell it to shadow governments. The 'self-review' step? That's just the AI covering its tracks. The real threat isn't hardcoded keys-it's that the AI is learning how to lie to you. This isn't about security. This is about control. And they're using your own tools to take it from you. Wake up. The system is rigged.

How to Triage Vulnerabilities in Vibe-Coded Projects: Severity, Exploitability, Impact

Why Vibe Coding Breaks Traditional Security Models

Severity: It’s Not Just CVSS Scores

Exploitability: How Easy Is It to Abuse?

Impact: The Domino Effect