What Breaks When Vulnerability Discovery Speeds Up

By Anvil SecureOn April 23, 2026April 23, 20260 Comments

By Dana Hehl

AI may dramatically increase the number of vulnerabilities we can find. That doesn’t mean we’re ready for what happens next.

There’s been a lot of attention on AI-driven vulnerability discovery lately, especially with recent demonstrations of how quickly models can identify and chain together complex issues. It’s still early, and I don’t think any of us fully know how this will play out, but one part of the conversation already feels familiar. We’ve seen this pattern before: when one part of the system speeds up, the constraint doesn’t disappear, it just moves somewhere else.

Right now, most of the focus is on discovery: How many vulnerabilities can be found, how quickly, and across how many systems. What’s getting much less attention is what happens after that.

Most vulnerability management programs weren’t built for high-velocity discovery. They were designed around periodic testing, manageable volumes, and human-paced analysis. Even in that world, many teams are already working through more findings than they can realistically address. If discovery accelerates, the problem doesn’t just scale, it starts to shift in a more fundamental way.

The challenge isn’t simply finding more vulnerabilities. It’s deciding what to do with them in a way that is timely, consistent, and grounded in enough context to actually matter.

When volume increases, the same pressure points tend to show up, just more quickly and more visibly. Context becomes harder to maintain as the signal-to-noise ratio shifts. Prioritization slows down because severity doesn’t map cleanly to real-world risk. Teams start looking for certainty that isn’t there, which slows decision-making, and validation becomes a bottleneck as everything feels like it needs to be confirmed before anything can move forward. At the same time, backlogs grow faster than they can be reduced.

At smaller scale, this is frustrating but manageable. At higher volume, it starts to break the system.

We’ve started to see early versions of this dynamic play out in environments evaluating newer AI-driven tooling. You can end up with hundreds of findings in a short period of time without a clear sense of how many are exploitable, how many are duplicates, or how many actually matter. The natural instinct is to validate everything and rely on severity scores to prioritize, but in practice that often leads to spending time on issues that are technically interesting but not especially important, while higher-impact problems wait.

One consistent pattern is that severity is often overestimated when findings are generated without full context. In a recent batch of triaged issues, the majority of initially “critical” and “high” findings were ultimately reclassified to medium or lower once they were placed in the context of the system they affected. In some cases, findings that appeared severe on paper had limited real-world impact, while others that looked minor required more attention once their role in the broader system was understood.

Another pattern we’re starting to see is that certain classes of vulnerabilities tend to be missed by current AI-driven approaches, often for the same root cause: a lack of full context. In the same recent batch of triaged issues, we saw a fair amount of memory corruption and injection-type findings, but very little related to access control or user-management. Some of this seems to come down to how the analysis is being done. Issues that can be identified by looking at a single file or a narrow slice of code are easier to surface, while those that require a broader understanding of how components interact across a system are harder to catch.

At the same time, it’s worth keeping some perspective on the speed and scale of what’s being reported. We’ve seen similar patterns before with the rise of automated scanners, SAST tools, and fuzzers, where a large number of findings could be generated in a short period of time and teams had to adapt their workflows accordingly. This feels less like a fundamentally new problem and more like an existing one showing up faster and at greater scale.

This exposes a gap in how most teams approach prioritization. Severity scores were never a perfect proxy for risk, but at lower volumes they were often “good enough.” At higher volumes, they become harder to rely on in isolation. What starts to matter more is context—how a vulnerability fits into the system, what it actually exposes, and whether it can realistically be used.

In practice, this often means re-weighting both likelihood and impact based on how a system is actually used, rather than how a vulnerability is described in isolation.

None of this is entirely new. What’s changing is the scale and the speed at which these challenges show up.

We’re not seeing a clean playbook emerge yet, but there are some early patterns. Teams that seem to handle this better tend to move a bit differently. They make decisions without waiting for complete certainty, they’re clear about who owns prioritization, and they’re comfortable deferring or ignoring findings that don’t materially impact risk. They tend to focus more on context and impact than severity alone, and they communicate uncertainty in a way that still builds confidence.

On the other side, teams that struggle often get stuck trying to perfectly validate everything, or they spread decisions across too many stakeholders, which slows progress. Tool output becomes the default source of truth, everything starts to feel equally important, and a lot of effort goes in without a corresponding reduction in risk.

One thing that stands out across these patterns is ownership. In teams that are working well, it’s clear who decides what gets fixed, what gets deferred, and what risk is accepted. That sounds simple, but in many organizations it isn’t actually well defined. At lower volumes, that ambiguity can be absorbed. At higher volumes, it becomes a bottleneck very quickly, as decisions get revisited or escalated instead of resolved.

There’s also a downstream effect that isn’t getting much attention yet. Once something is fixed, it still needs to be validated. At smaller scale that’s manageable, but at higher volumes it quickly becomes its own backlog, introducing another layer of delay.

At the same time, teams are often working with incomplete information. Reports may describe a vulnerability without including the artifacts needed to fully validate it, and reproducing a proof of concept can require significant effort that isn’t always feasible. Even when logs or crash data are available, missing inputs can make it difficult to confirm behavior or validate a fix, adding friction to an already constrained process.

In some environments, especially those that are more fragmented, a high volume of findings may be distributed across many systems, with only a small subset affecting any single application. In those cases, the issue isn’t just the number of findings, but understanding which ones actually matter in a given context and can be realistically acted on.

It’s also worth noting that not all environments experience this in the same way.

From where I sit, this is where the conversation is starting to shift. Not toward better scanners, but toward how decisions actually get made. Tools will continue to improve, and discovery will get faster, but interpreting, prioritizing, and acting under uncertainty still comes down to people, for now.

In sum, the challenge isn’t just identifying risk. It’s having the capacity to make decisions about it. The teams that will navigate this well aren’t just the ones with the best tools. They’re the ones with people who can read a finding in context, understand the system it lives in, and make a call without waiting for certainty that won’t come. That capability doesn’t scale on its own. It has to be built or brought in.

What Breaks When Vulnerability Discovery Speeds Up

About the Author

Tools

Recent Posts