Copy Fail is a 732-byte root shell. Patch your Linux fleet this week.
CVE-2026-31431 is a deterministic privilege escalation in the Linux kernel affecting versions 4.14 through 6.19. A Python script gives any local user root. Every major distro is affected, containers don't help, and the mitigation is trivial.
CVE-2026-31431 is a local privilege escalation in the Linux kernel. A logic bug in the algif_aead crypto module lets any unprivileged user write four bytes into the page cache of any readable file. The public exploit is a 732-byte Python script that targets /usr/bin/su, injects shellcode into the .text section via page-cache corruption, and hands back a root shell. It is deterministic. No race condition, no timing window, no reliability problems. It works every time, on every affected kernel, on every architecture.
CISA added it to the Known Exploited Vulnerabilities catalog on May 1. Federal agencies have until May 15 to patch. Since then, CrowdStrike has reported the first confirmed in-the-wild incident: an attacker compromised a Jenkins CI/CD server, escalated from the jenkins user to root with Copy Fail, installed a coinminer, and attempted lateral movement. Go and Rust reimplementations of the exploit have joined the original Python PoC in public repositories.
If you run Linux in production, this is the vulnerability that should be driving your week.
Updated May 4 with new distro patch status, the CrowdStrike incident report, and a correction on RHEL mitigation.
Why this one is different
You’ve heard the Dirty Pipe comparison, and it’s fair: same primitive, unprivileged write into the page cache. But Dirty Pipe required winning a race condition, which made it unreliable enough that a lot of teams treated it as a “patch when convenient” item. Copy Fail removes that excuse. The exploit is deterministic and portable. The same script runs across distros, across architectures, without modification. The page-cache corruption itself is entirely in-memory, which means file integrity monitoring tools will not flag it.
The affected range is enormous. Kernel 4.14 through 6.19.12 and all 7.0 release candidates. The vulnerable commit landed in August 2017. That means this bug has been sitting in production kernels for nearly nine years. Taeyang Lee at Theori identified it while studying how the kernel crypto subsystem interacts with page-cache-backed data; Theori’s AI vulnerability scanner, Xint, then confirmed the exploitable path in roughly one hour of scanning. Nine years undetected, one hour to find. Ubuntu, RHEL, SUSE, Debian, Amazon Linux, Arch, Fedora, Rocky, AlmaLinux, Oracle Linux. If you run a major distribution, you’re affected unless you’ve already patched.
The root cause is a convergence of three kernel changes. The authencesn implementation in 2011, AF_ALG AEAD socket support in 2015, and an in-place optimization in algif_aead.c from 2017 that set req->src = req->dst. That optimization let page-cache pages from splice() land in a writable destination scatterlist. It’s classified as CWE-669, incorrect resource transfer. CVSS 7.8 with high impact across confidentiality, integrity, and availability.
Containers do not save you
This is the part that should change your patching priority if you’re running Kubernetes or any multi-tenant container platform.
The page cache is shared across the host. A container running an unprivileged workload can corrupt the page cache of files on the host filesystem. That’s a container escape to the node, without needing any elevated capabilities. Microsoft’s security blog flagged “millions of Kubernetes clusters” as affected. Wiz, Sysdig, Kaspersky, and CERT-EU have all published independent analyses.
The most-vulnerable scenarios are the ones where you’re running untrusted or semi-trusted code: multi-tenant Kubernetes clusters, self-hosted CI/CD runners where pipelines execute arbitrary build scripts, and AI agent sandboxes that run LLM-generated code in containers. If your isolation model relies on Linux namespaces and cgroups, it does not stop this exploit.
What does stop it: gVisor (intercepts the syscall), Firecracker (separate kernel per microVM), V8 isolates (no kernel interaction), and Android’s SELinux policies (block AF_ALG socket creation). If your workloads are on AWS Lambda, Fargate, or Cloudflare Workers, those platforms are protected by architecture, not by patching.
Where patches stand right now
Fixed kernel versions: 5.10.254, 5.15.204, 6.1.170, 6.6.137, 6.12.85, 6.18.22, 6.19.12, and 7.0 onward. The upstream fix landed on April 1 in commit a664bf3d603d.
As of May 4, distro patch status has improved since the weekend, but two of the biggest fleets are still waiting:
- AlmaLinux 8, 9, 10: Patched.
- Arch: Patched.
- Debian: Patched via downstream.
- SUSE: Patched. SLES 15 SP7, SLES 16.0, SUSE Linux Micro 6.0 through 6.2, openSUSE Leap 15.6, and all LTSS releases (12 SP5, 15 SP4 through SP6) have shipped kernel fixes. Only live patching modules are still in progress.
- Ubuntu: Partial. Canonical released USN-8226-1, a kmod package update that disables loading of the algif_aead module automatically. Run
sudo apt update && sudo apt upgradeand reboot to pick it up. Full kernel patches have not shipped yet. - Red Hat: No kernel errata released. See the mitigation section below for an important caveat about RHEL specifically.
- Amazon Linux: Still listed as “Pending Fix” across AL2 and AL2023.
If your distro hasn’t shipped the kernel fix yet, don’t wait for it to decide whether to mitigate.
One line buys you time
Almost no production software uses the AF_ALG userspace socket interface. The cryptographic subsystem this bug lives in is an optional kernel module that provides socket-based access to the kernel’s crypto API. It’s used for testing and specialized applications. dm-crypt, kTLS, IPsec, SSH, OpenSSL, GnuTLS, NSS, and in-kernel TLS all use the kernel crypto API directly and never touch AF_ALG.
For modular builds (Debian, Ubuntu, Arch, SUSE, and most others), disable the module:
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
Important: this does not work on RHEL. Red Hat kernels (RHEL 8, 9, and 10) compile algif_aead into the kernel (CONFIG_CRYPTO_USER_API_AEAD=y), not as a loadable module. The modprobe rule has no effect on a built-in. The same applies to RHCOS (Red Hat CoreOS) in OpenShift clusters.
For RHEL and other built-in builds, blacklist the init function via kernel boot parameter:
initcall_blacklist=algif_aead_init
This requires a reboot. If rebooting is not an option, the fallback is a seccomp profile that blocks AF_ALG socket creation at the syscall layer.
For container environments, add a seccomp policy that blocks AF_ALG socket creation. This stops the exploit at the syscall layer without touching the kernel at all. Note: if you run OpenShift, the default restricted-v2 SCC sets allowPrivilegeEscalation: false, which blocks the escalation step. Pods running under anyuid, privileged, or custom SCCs that allow privilege escalation are still vulnerable.
The operational impact of any of these mitigations is negligible. You are not going to break anything. If you have a change management process that requires lead time for kernel patches but allows configuration changes on a shorter cycle, the modprobe rule (or boot parameter on RHEL) is the path of least resistance. Push it today, patch the kernel on your normal schedule.
What your Monday looks like
If you run Linux in any capacity, this needs to be on Monday’s queue. Not because the sky is falling, but because the exploit is public, trivial to execute, and CISA has set a federal deadline of May 15. Your auditors will ask about it, and “we were waiting for the distro patch” is not a satisfying answer when the mitigation takes one line.
Priority one: Apply the modprobe or boot-parameter mitigation across your fleet. This buys you time and costs you nothing operationally.
Priority two: Patch the kernel on hosts where the distro fix is available. AlmaLinux, Arch, Debian, and SUSE have all shipped. Ubuntu’s kmod update disables the module automatically if you apt upgrade. If you’re on any of those, the kernel update should go into this week’s maintenance window (or has already landed).
Priority three: Audit your container environments. The CrowdStrike incident involved a Jenkins CI/CD server, which is exactly the kind of semi-trusted workload that makes this bug dangerous. If you’re running multi-tenant Kubernetes, self-hosted CI runners, or any workload that executes code from external sources in containers, the seccomp mitigation should go out alongside the host-level fix. Container boundaries are not sufficient on their own.
Priority four: If you’re on RHEL or Amazon Linux, you’re still waiting for kernel errata. RHEL requires the boot parameter (not modprobe), and Amazon Linux’s patches are pending. Track status daily; apply the mitigation now.
The window
The CISA KEV deadline is May 15. The exploit is public in three languages now (Python, Go, Rust). This is not a theoretical risk that requires a sophisticated attacker. Any user with local access can escalate to root in seconds, and the page-cache corruption leaves no artifact on disk for forensics to find. CrowdStrike’s Jenkins incident shows the attack chain in practice: remote compromise of an application, then Copy Fail for root, then lateral movement.
The mitigation is fast, safe, and has no operational side effects (just make sure you use the right one for your distro). PatchDay Alert flagged this one in the daily digest on May 1, the same day it hit the KEV catalog. The work here is not hard. It’s making sure it actually gets done across your fleet before someone else tests it for you.
Sources
- Copy Fail disclosure (Theori/Xint)
- Xint blog: Copy Fail across Linux distributions
- Microsoft Security Blog: CVE-2026-31431 analysis
- NVD: CVE-2026-31431
- CISA KEV addition (The Hacker News)
- Tenable: Copy Fail FAQ
- CERT-EU Advisory 2026-005
- Sophos: PoC exploit analysis
- Bugcrowd: Copy Fail overview
- AlmaLinux: CVE-2026-31431 patches
- Ubuntu USN-8226-1 (kmod mitigation)
- SUSE CVE-2026-31431 status
- SOC Prime: Copy Fail detection rules
- GitHub PoC: Theori
Share
Related field notes
-
A 4.3 that mattered: the 13-day gap between patch and exploitation flag
Microsoft patched CVE-2026-32202 on April 14 without marking it exploited. APT28 had been using it since at least December. The gap between those two facts is where triage models break.
-
Windows Defender is the attack surface now, and two of the three exploits don't have patches
Three tools dropped in April turn Defender's own privileged operations into privilege escalation and detection evasion. Microsoft patched one. The other two work on fully patched systems.