Anthropic's MCP gives every downstream app unauthenticated RCE, and they called it expected behavior

The Model Context Protocol's STDIO transport passes user input directly into subprocess execution with no sanitization. OX Security found 14+ CVEs across the ecosystem. Anthropic declined to patch.

Anthropic’s Model Context Protocol ships a transport layer that takes a user-supplied command string and passes it directly into subprocess execution with no sanitization. Every official SDK does this: Python, TypeScript, Java, Rust. OX Security disclosed 14+ CVEs across six production platforms, all stemming from this single design choice. Anthropic’s response was two words: “expected behavior.” No patch. No CVE for the protocol. No SDK update.

That is the whole story. Everything else is damage assessment.

What the protocol actually does

MCP’s STDIO transport accepts a command and args field via StdioServerParameters and immediately spawns a subprocess. The SDK performs no validation on the command string before executing it. Validation of whether the subprocess is a legitimate MCP server happens after execution. If it is not, an error is returned. But the command already ran.

This is not a bug in one implementation. It is the documented behavior of the protocol, inherited by every application built on MCP’s official SDKs. The vast majority pass user-controlled input into that command field with no sanitization, no allowlisting, no access control.

OX Security identified four exploitation families. Direct UI injection is the most common: platforms like LangFlow expose MCP configuration panels without authentication, an attacker submits a malicious command value, and the subprocess runs. OX found 915+ publicly accessible LangFlow instances via internet scanning. Allowlist bypass hits platforms that tried hardening: Flowise and Upsonic restricted commands to approved binaries like npx and python, but attackers bypass via argument injection (npx -c "<malicious command>"). The allowlist validates the binary name but not its flags. Zero-click prompt injection targets AI coding tools; CVE-2026-30615 in Windsurf required no user interaction beyond opening a project. And marketplace poisoning is the supply chain vector: OX submitted a trial payload to eleven MCP directories, and nine published it with no security review.

Nine of eleven. No review.

The CVE count

The confirmed CVEs include Flowise (CVE-2026-40933, CVSS 9.9, patched in 3.1.0), Windsurf (CVE-2026-30615, CVSS 8.0, unconfirmed patch status), LiteLLM (CVE-2026-30623, patched in v1.83.7-stable), and Agent Zero (CVE-2026-30624, CVSS 8.6, unconfirmed). Additional CVEs hit Fay Framework, Bisheng, LangChain-ChatChat, Upsonic, Cursor, MCP Inspector, LibreChat, and WeKnora. NVD has not completed enrichment for several entries. Flowise, LiteLLM, and Bisheng have shipped patches. The rest remain unpatched.

OX claims 7,000+ publicly accessible MCP servers and up to 200,000 total vulnerable instances. They cite 150 million aggregate downloads across affected packages on PyPI and npm. That figure is arithmetically plausible but unverified, and download counts on package registries are inflated by CI/CD cache misses and automated mirrors. Treat it as a researcher estimate. The directional point holds regardless: MCP adoption is broad, and the vulnerable surface is not small.

The design-vs-bug argument

Anthropic’s position is that STDIO execution is a deliberate design choice and input sanitization is the responsibility of downstream developers. ARMO Security published a counterpoint arguing the framing conflates expected OS behavior with a protocol design flaw. Their argument: STDIO is no more dangerous than git clone or npm install, and teams should treat MCP servers the way they treat npm dependencies.

This is technically substantive and practically irrelevant. The question is not whether subprocess execution is inherently dangerous. It is whether a protocol designer who ships execute-first-validate-never as the default SDK behavior, then watches 14+ CVEs cascade across the ecosystem, bears any responsibility for the pattern they created.

OX documented the answer empirically. They found the same vulnerability class reproduced across LangFlow, Flowise, LiteLLM, Cursor, Windsurf, and others. The Cloud Security Alliance AI Safety Initiative sided with the systemic-flaw framing. Kevin Curran, IEEE senior member and cybersecurity professor at Ulster University, called it “a shocking gap in the security of foundational AI infrastructure.” The “it’s by design” defense works better when the design does not produce the same critical vulnerability in every downstream implementation.

The timing

OX researchers noted something worth sitting with. Anthropic had just launched Claude Mythos, marketed in part as a tool to find vulnerabilities in other organizations’ software, while declining to address a systemic vulnerability in its own protocol. One concession: a week after OX’s private disclosure, Anthropic quietly updated their security documentation to state that MCP STDIO adapters “should be used with caution.” No formal public statement. No blog post. No advisory explaining the decision.

“Should be used with caution” is not a security posture. It is a disclaimer.

What to do about it

No confirmed in-the-wild exploitation as of today. None of these CVEs appear in CISA’s KEV catalog. But OX executed commands on six live production platforms during coordinated disclosure, and the 915+ open LangFlow instances are trivially fingerprinted via Shodan or Censys. The reconnaissance has been done by defenders. The same methodology is reproducible by attackers.

If you run MCP servers, audit your stack now. Identify every MCP server, its transport type, and its version. Upgrade LiteLLM to v1.83.7-stable and Flowise to v3.1.0. If your platform allows users or API callers to define new STDIO MCP servers, require admin-level authentication for that endpoint immediately. Containerize every STDIO server you cannot upgrade. Treat MCP marketplace installs as untrusted code, because nine of eleven directories will publish whatever you submit.

For unpatched platforms like LangFlow, disable the MCP adapter or firewall the configuration endpoint to internal IPs only. Block inbound to MCP server ports. Enforce egress filtering on MCP server hosts. Segment them from production data stores. For detection, MCP gateways like Lunar.dev MCPX and MintMCP produce structured audit logs. Datadog has published MCP-specific detection rules. Cisco has released an MCP Scanner for behavioral code threat analysis. Host-level process monitoring via EDR is required to catch STDIO injection that stays local and generates no network signals. For new deployments, Streamable HTTP transport eliminates the STDIO injection class entirely, but requires OAuth 2.1 and strict Origin header validation to avoid substituting a different exposure.

The MCP roadmap focuses on Streamable HTTP, governance, and enterprise readiness. It does not list STDIO security hardening. Whether STDIO transport will be formally deprecated is unclear. What is clear is that the protocol’s designer has looked at the damage and decided it is someone else’s problem.

The pattern

The closest historical analog is the npm event-stream backdoor from 2018: a supply chain attack via trusted ecosystem tooling, developers as the attack surface, no security review before publication. The difference is that event-stream was targeted and stealthy. MCP STDIO is a broad architectural flaw, publicly documented with proof-of-concept code, affecting every platform that followed the official SDK’s example.

When regulated-industry risk teams at firms like JPMorgan Chase, Citi, and BNY are being briefed on this as an unpatched flaw rather than an acceptable design tradeoff, “expected behavior” stops being a technical position and starts being a liability position. PatchDayAlert will track patch status across these platforms and flag changes in the daily digest as they ship.

The design is working exactly as intended. That is the problem.

Anthropic's MCP gives every downstream app unauthenticated RCE, and they called it expected behavior

What the protocol actually does

The CVE count

The design-vs-bug argument

The timing

What to do about it

The pattern

Sources

50 CVEs in 18 months is not a growing pain. It's a design choice the industry keeps making.

A model was pulled for being too good at finding bugs

Three CVEs keep getting called the Nx attack, and only one of them is this one