Everything is critical, so nothing is critical

A third of last year's CVEs were rated High or Critical, but only a few percent ever get exploited. The severity score was never a risk score, and the queue that treats it like one is the reason confirmed-exploited bugs sit unpatched for 43 days.

The flagship Linux vulnerability of 2024 took six to eight hours to exploit against a single server in a lab, and even then it only worked about once every 10,000 tries. regreSSHion (CVE-2024-6387), an unauthenticated root remote code execution in OpenSSH, got a logo, wall-to-wall coverage, and an estimated 14 million exposed servers. It also got a CVSS base score of 8.1. What it did not get, according to Qualys, the firm that found it, was a reliable public exploit for 64-bit production systems at disclosure. CISA didn’t add it to its exploited-vulnerabilities catalog until roughly 20 months later. The score was accurate. The reading everyone took from it, “mass RCE, patch everything tonight,” was not.

That gap between the number and the reality is the whole story, and it isn’t about one bug. It’s about what happens to a prioritization system when the top of the scale stops being rare.

The obvious read

“Patch all criticals first” is the rule most teams run on, and for good reason. It’s simple, it’s auditable, and for years it worked. When Critical meant a handful of CVEs a month, sorting by severity got you most of the way to sorting by risk. The CVSS base score was a reasonable proxy, the high end was sparse, and the bugs that landed there were usually the ones that mattered.

That world is gone. The heuristic didn’t get worse. The inputs did.

The pattern: the top of the scale stopped being rare

Published CVEs hit 40,009 in 2024 and 48,185 in 2025, per Jerry Gamblin’s annual reviews of the NVD-scored data, the seventh-plus consecutive record year and roughly 130 new CVEs a day. The severity distribution is the part that breaks the heuristic. Of the 2025 cohort, 8.3% were rated Critical and 31.1% High, so about 39% landed in the top two severity bands. That’s nearly 19,000 “high-priority” vulnerabilities in a single year, roughly two in five published CVEs flagged top priority, about 50 of them every day.

Now line that up against how many vulnerabilities actually get used. CISA’s Known Exploited Vulnerabilities catalog, the curated list of CVEs with confirmed in-the-wild exploitation, held roughly 1,484 entries at the end of 2025, while FIRST’s EPSS model scores well over 337,000 CVEs. Confirmed KEV-level exploitation is therefore under 0.5% of the universe. Broader counts that include weaponized-but-not-KEV bugs run higher: Qualys puts the weaponized share under 3%, and academic work finds exploitation evidence for about 5.5%. The honest range is 2% to 6% of all CVEs are ever exploited, with the spread driven by the observation window and what each study counts as exploitation. Nobody has a clean number, and that’s worth carrying with you.

The two facts don’t fit together. A third of the catalog is flagged urgent. A few percent of it will ever be touched. Severity, as a filter, is letting almost everything through.

The evidence: severity has terrible specificity

The foundational measurement here is the Cyentia Institute / Kenna Security “Prioritization to Prediction” work. Yes, roughly 55% of exploited CVEs were rated Critical or High. But Critical and High also cover most of the whole catalog, so the overlap is nearly meaningless as a sorting signal. Cyentia’s efficiency math makes it concrete: a “remediate everything at CVSS 8.8 and above” strategy covers about 25% of all published CVEs and catches only about half the exploited ones, for roughly 5% efficiency. Put plainly, 95% of the work that policy generates touches vulnerabilities that are never exploited.

What does correlate with exploitation is narrower than the score. In the Cyentia/Kenna data, public exploit code raises a CVE’s exploitation rate roughly sevenfold, from about 3.7% to about 37.1%, and exploitation clusters on internet-facing services and authentication bypass rather than on score magnitude. The score is measuring something, but it isn’t measuring the thing the queue needs.

It inflates by design, not by accident. FIRST’s Consumer Implementation Guide says producers “intentionally make worst-case assumptions about the deployment environments” when assigning a base score: no mitigating controls, a working exploit assumed. And FIRST is direct that this is not a risk score. The v4.0 User Guide states base scores “should not be used alone to assess risk,” and that only a full Base plus Threat plus Environmental score gets “much closer to ‘Risk’.” Almost nobody computes the full score. NVD, vendors, and SLAs all run on the base score. The spec says that’s wrong; the industry does it anyway.

Two forces push the numbers up. Vendors and CNAs rate their own bugs, don’t know how customers deployed the product, and default to worst case, and the asymmetry is reputational: under-rating looks negligent if the bug later gets popped, while over-rating just looks careful. The formula itself also clusters. Jacques Chester’s analysis of the NVD distribution found four values (7.5, 7.8, 8.8, 9.8) dominate because of how metric combinations multiply through the equations. And scorers don’t even agree: a 2024 IEEE study found 68% of evaluators gave different severity ratings for the same vulnerability. VulnCheck found about 20% of 120,000 CVSS v3 CVEs carry two scores, and about 56% of those conflict on severity level. CVE-2023-21557, a Windows LDAP denial of service, scored 7.5 (High) from Microsoft and 9.1 (Critical) from NIST, a gap that flips its place in the queue. The number on a CVE reflects who scored it nearly as much as what the bug does.

The scoring infrastructure is also retreating. NIST’s NVD enrichment effectively stalled in February 2024, and VulnCheck found that by May 2024, 93.4% of CVEs added since the stall had received no NVD analysis at all, including the score. In 2026, NIST formalized the retreat: it now enriches only CVEs that are KEV-listed, affect federal software, or hit critical platforms, leaving the rest “Not Scheduled.” The agency the “score everything and sort by score” model depends on has admitted it can no longer score everything.

What this means for your patch queue

Stop treating CVSS as the verdict and start treating it as one input. The practical move is a layered triage that stacks exploitation signal and exposure on top of severity, in that order.

Filter	What it tells you	How to use it
CISA KEV	Confirmed in-the-wild exploitation	Bypasses everything. Patch on a tight clock. The list is small, which is the point.
EPSS	30-day exploitation probability	Sort the remainder. At a 10% EPSS cutoff, roughly 65% efficiency and 63% coverage, versus ~5% for the CVSS 8.8+ cut.
Exposure / context	Is the asset internet-facing? Business-critical?	Your local judgment. SSVC turns this into a decision table instead of a number.

KEV goes first because confirmed exploitation is the hardest signal available, and BOD 22-01 already mandates it for federal agencies on tight timelines. A 7.5 denial-of-service bug in SolarWinds Serv-U drew a hard federal deadline exactly this way: the kind of middling score the severity queue would have parked, moved to the front purely because exploitation was confirmed. EPSS sorts the long tail; CISA’s SSVC replaces the score outright with four inputs landing on Defer / Scheduled / Out-of-Cycle / Immediate, and its Vulnrichment program pre-fills the exploitation and automatability fields so a deployer adds only two local calls. Severity stays in the mix; it just stops being the decision.

The cost of not doing this is already on the books. The 2026 Verizon DBIR found median time-to-remediate climbed to 43 days, and organizations fully remediated only 26% of KEV-listed vulnerabilities, down from 38% the year before. Those are the highest-signal bugs in existence, and they’re slipping. Cyentia’s capacity research found teams close roughly one in ten open vulnerabilities a month regardless of size, a ceiling, not a discipline problem, and that CVSS 10.0 bugs are among the slowest remediated, not the fastest. When a third of the catalog is flagged urgent, maximum severity stops functioning as a signal, and the same DBIR found vulnerability exploitation was the top breach access vector at about 31%, ahead of credential abuse at about 13%. The signal broke while the work got harder.

The score cuts both ways. When OpenSSL pre-announced an incoming Critical patch in October 2022, the industry stood up Heartbleed-grade emergency response; on release, “SpookySSL” (CVE-2022-3602) was downgraded to 7.5 High because on most platforms the overflow hits unused stack memory, no reliable RCE emerged, and it never landed in KEV. Teams spent incident-response hours on the label. The counter-case keeps the model honest: MOVEit Transfer (CVE-2023-34362) was a 9.8 that meant it, an unauthenticated SQL injection on an internet-facing file-transfer box that Cl0p exploited as a zero-day, CISA-listed within six days, 2,700+ organizations breached. The layered model flags it at the top on exposure and low attack complexity, not on the number alone. The point isn’t to distrust high scores. It’s to stop letting the score be the only thing in the room.

What to watch

CVSS 4.0 (November 2023) added nomenclature (CVSS-B / BT / BE / BTE) to force the base-only limitation into the open, plus Threat metrics for exploit maturity, and its implementation guide now states outright that base scores “likely do not reflect real-world severity.” Whether that changes vendor behavior or just adds labels nobody computes is unsettled, and adoption data is thin so far. The other thing to watch is the NVD retreat: if the scoring authority only enriches the KEV-and-critical-platform slice, the rest of the catalog ships without a score, and the “sort by severity” habit loses the input it was built on. That pressure may force the layered model faster than any guidance document will.

The fix was never a better number. A score that’s accurate about worst-case severity and useless for predicting which bug gets used isn’t broken; it’s being asked to do a job it was never built for. Demote it to an input, put confirmed exploitation and exposure in front of it, and the queue starts meaning something again. That’s the read PatchDayAlert is built on: the digest triages each day’s CVEs by exploitation signal first, not by raw CVSS, so the handful that warrant your morning don’t drown in the vast majority that never will.

Sources

Qualys regreSSHion advisory — 2024-07
Jerry Gamblin 2025 CVE Data Review — 2026-01-01
Cyble, 2025 CISA KEV growth — 2025
FIRST EPSS data & stats
VulnCheck, evidence-based prioritization — 2024-06-25
Cyentia, EPSS v2 efficiency — 2022-02-14
FIRST CVSS v4.0 Consumer Implementation Guide — 2023
FIRST CVSS v4.0 User Guide — 2023
Jacques Chester, A Closer Look at CVSS Scores — 2021-10
Wunder et al., CVSS Scoring Inconsistencies (IEEE S&P 2024)
The Register, CVSS scores need overhauling — 2025-10-16
VulnCheck, NVD backlog exploitation — 2024-05-23
Help Net Security, NIST narrows NVD enrichment — 2026-04-16
CISA SSVC
CISA BOD 22-01 — 2021-11
SecurityWeek, Verizon DBIR 2026 — 2026-05
OpenSSL Security Advisory 20221101 — 2022-11-01
NVD CVE-2023-34362

Everything is critical, so nothing is critical

The obvious read

The pattern: the top of the scale stopped being rare

The evidence: severity has terrible specificity

What this means for your patch queue

What to watch

Sources

Does this CVE actually apply to you? Three filters before you patch

When the firmware patch drops, the exploit race has already started

The patch queue is being rebuilt around the asset, not the score

Five edge and gateway bugs went under active attack in one week. Here is the patch order.