Tavis Ormandy, Google security researcher, has published details on a hardware vulnerability found in AMD CPUs. The vulnerability affects the Zen 2 series CPUs, first presented in 2019. Even though an obsolete architecture, it was still used in CPUs as late as early 2021. The lineup includes CPUs for personal computers (such as the popular Ryzen 5 3600), laptops, and — most importantly — servers (AMD EPYC “Rome” CPUs). For a full list of the CPU series susceptible to Zenbleed, refer to this article by Ars Technica.
The flaw stems from a combination of fairly harmless AMD CPU features. It turns out that, if combined, a certain interaction with CPU registers and a perfectly normal system of speculative code execution may result in a leak of secret data. In theory, it is fairly easy to steal information using this vulnerability (unique ID CVE-2023-20593), and at quite a high speed, too: up to 30 kBps for each one of the CPU cores. So far, no real exploitation cases have been reported. On the other hand, patches (CPU microcode updates) are available just for part of the affected CPUs. AMD promises to solve the problem completely by the end of 2023.
Zenbleed exploitation details
As was mentioned before, Zenbleed exists thanks to the speculative execution system. The vulnerability is not easy to explain. In his blogpost, Tavis Ormandy presents cold facts that only an experienced low-level coding pro can get to the bottom of. In a nutshell, here is one of the instruction sets for Zenbleed exploitation:
A GitHub description by the Google Information Security team sheds some light on the nature of the problem. For the past 15 years, Intel and AMD CPUs have been using the instruction extension set AVX. Among other things, these instructions support 128- and 256-bit vector registers. To put it really simple, CPU registers are used for temporary storage of data when executing instructions. In some cases, being able to store sufficiently large amounts of data in vector registers allows to considerably improve performance. The 128 bit (XMM) and 256 bit (YMM) registers are commonly used for the most routine operations, such as related to read/write from/to RAM.
Concurrent use of 128 and 256 bit registers brings another set of problems. If used simultaneously within the same task, XMM registers are automatically converted into YMM registers. This is where the zeroing of the upper “half” of the YMM register is routinely performed. The special instruction for that is vzeroupper. All registers are stored in the so-called register file and are used in turns by different programs run on the computer.
What is common between Zenbleed and Use After Free?
If you create conditions for the vzeroupper instruction to be executed speculatively, the operation will end incorrectly in AMD Zen 2 CPUs. CPUs can execute instructions without waiting for the results of the previous calculations based on branch prediction. This accelerates the work a great deal but can also result in a situation where instructions are executed “in vain”, not being required by the program logic. If that happens, the instruction execution results must be rolled back. Thus, if vzeroupper is executed “in vain”, the zeroing out of one half of the YMM register must be canceled.
This is where a logic error comes into play in Zen 2 CPUs. The register remains in the so-called “undefined” state. Which means, it may still contain pieces of data from other programs that use the shared register file. In a normal situation, no actors should have access to this data. Zenbleed creates conditions where malware can “monitor” the information that goes through vector registers.
In a sense, such CPU behavior closely resembles the typical software error known as use after free. It is when one program uses a certain RAM area to store its data, and then vacates this RAM area making it available to other applications. As a result, a third program can read such data, which can potentially contain secret information. Yet in the Zenbleed case, it is not a software error but a hardware one.
In theory, Zenbleed allows to read secrets directly, and does it at rather a high speed. This doesn’t mean much by itself: things like what data can be read, or whether it can be used in harmful ways, depend on a given situation. Only applications that use XMM and YMM at the same time are affected by this vulnerability. First of all, these are Linux system libraries and the Linux kernel itself, as well as cryptographic libraries and systems like OpenSSL. Also, getting information requires the application to be data-intensive. In order for an attacker to get something really useful, it is necessary to run some encryption process on the affected computer, or to actively use the browser for web-surfing, otherwise the exploitation of the vulnerability will be in vain.
We have only been shown the demo code, the proof of concept. It was beyond the scope of the study to demonstrate a really harmful scenario. According to Cloudflare team, the issue is fairly easy to exploit. One could do it even using a browser. We could imagine an attacker sending their victim a link to a pre-built web page to steal passwords to sensitive accounts from the memory cache. The saddest part is, a theft like that wouldn’t even leave any traces. Can it be pulled off in real life? We don’t know it yet.
But we do know that Zenbleed is most dangerous in a corporate environment. Just imagine a situation where a virtual server renter can read data from other servers and even the hypervisor, provided they use the same CPU cores. This is why the very first patch to be released was addressing AMD EPYC server CPUs.
Future of hardware security
In the closing part of his article, Tavis Ormandy imparts that he had discovered the problem thanks to fuzzing. As applied to software testing, fuzzing normally means “feeding” the program random data in search of a situation where a specific set of such data causes some abnormal behavior. Here we have a more sophisticated challenge: hardware fuzzing (let’s call it that) implies creating programs employing a random set of instructions in search of an abnormal CPU response. An abnormal termination of such a program is not necessarily an issue signal as such. Ormandy proposes several methods for anomaly detection, such as running the same code on different CPUs — if identical programs demonstrate different behavior, it prompts an investigation to make sure no CPU logic error is involved.
The history of hardware vulnerabilities suggests that usually it is not enough to close one problem alone. After the patch is applied, a new way to circumvent the new defense system can be found, for the problem is in the fundamental CPU operation principles. That is why Tavis Ormandy has not only found an AMD Zen 2 CPU vulnerability, he has also proposed some interesting strategies on how to locate other potential errors.
Potentially dangerous as it may be, Zenbleed is not likely to be used to attack individual users. But the server infrastructure of organizations is a different story. In this particular case, you might say it was a narrow escape: the problem was found and patched up with a microcode update, with only a minor performance drop. If your infrastructure uses AMD Zen 2 CPUs, you really need this patch, too. But chances are, this research will be followed by others. The whole attitude to hardware security may be revised, with new comprehensive security tests (employing both fuzzing and other strategies) coming into the picture. Let’s hope hardware vendors will be able to use them to good advantage. But organizations still need to integrate the risk posed by emerging similar vulnerabilities into their security models.