Meltdown Notes

Mike Yang

Backend Developer, passionate about distributed systems, observability and AI. Beyond programming, interested in neuroscience and sports, enjoys parkour and bouldering.

Meltdown Notes

Jul 24, 2021

operating-systems
hardware

Meltdown can let us read kernel memory from user space which breaks OS’ isolation.

That means we can use meltdown to read some privilege data such I/O or network buffer.

Before explain how it works let us to see how OS prevent such things happen.

Memory Isolation

In user mode data can be accessed only through virtual memory.

Virtual memory will map virtual address to real(physical) address.

Before hardware do the mapping will check permission first, if some memory belongs to kernel that memory can’t be read in user mode.

If program dose such thing will cause page fault exception.

The permission is recored by PTE(page table entry).

Below picture is RISC-V’s PTE.

Screen Shot 2021-07-24 at 11 25 01 PM

If the U bit is set to 1 means the memory belongs to user.

Only when program in kernel mode can create or modify the PTE.

This is how OS provide memory isolation.

Out-of-order Execution

Out-of-order execution can let us to read kernel memory from user model but the result is invisible in architectural level.

When CPU run some slow instructions such as load from memory, CPU will run the following instructions first for performance reason.

Before the slow instruction is finished, the following instructions’ changes are invisible in architectural level. This is out-of-order execution.

Then after the slow instruction has finished, then CPU will make the state changes visible or rollback if some things go wrong. They call this retirement.

In out-of-order execution hardware will not check exception such as page fault until when CPU do retirement.

let’s see one example:

r0 = <something>
r1 = valid    // r1 is a register; valid is in RAM
if(r1 == 1){
 r2 = *r0
 r3 = r2 + 1
} else {
r3 = 0
}

For line 1 needs to load value from memory to register and that may take hundreds of cycles.

CPU will not wait idle for the load and it will run lien 4 ~ 5 in parallel.

Suppose r0 is some kernel address, we know when CPU exec instructions out of order, it will not check exceptions.

So r0’s value has been loaded in some invisible place.

After r1 has been loaded, CPU will do retirement. Hardware will see current mode is user mode and the program is reading a kernel address.

So we should rollback changes and raise page fault exception.

Hardware designer thought that kind of changes are invisible to program.

Flush-Reload

In the Out-of-order execution section, we demonstrated how to read kernel memory in user mode but the result is invisible to us.

In this section will see how to make the result visible.

Cache

CPU has cache for storing recent accessed data.

The speed for CPU to read data from cache is much more faster than read data from RAM.

For example, read from cache needs a dozen cycles, read from RAM needs hundreds cycles.

If we read some address fast means the address has been used recently and slow means it has not been used recently.

Flush-Reload

Flush-reload will allow you to check a function used the memory at an address x or not.

The steps are:

ensure x is not cached
call f()
record the time takes, say t1
load a byte from address x
record the time again, say t2
if the difference between t1 and t2 is small we can infer x has been used in f()

When CPU do retirement, it will cancel register state but will not flush the cache.

So we can use flush+reload to get out-of-order execution’s internal state.

Meltdown

Let’s see how to read 1 bit at kernel address r1

char buf[8192]
2
// the Flush of Flush+Reload
clflush buf[0]
clflush buf[4096]

<some expensive instruction like divide>
8
r1 = <a kernel virtual address>
r2 = *r1         // out-of-order
r2 = r2 & 1      // out-of-order
r2 = r2 * 4096   // out-of-order
r3 = buf[r2]     // out-of-order

<handle the page fault from "r2 = *r1">

// the Reload of Flush+Reload
a = rdtsc // get time
r0 = buf[0]
b = rdtsc
r1 = buf[4096]
c = rdtsc
if b-a < c-b:
 low bit was probably a 0

As line 7 is an expensive instruction, CPU will execute lien 9 ~ 14 in parallel.

r2 = *r1 will read r1’s content.

Then get the lowest bit through r2 = r2 & 1.

r2 = r2 * 4096
r3 = buf[r2]

Will cause a read at buf[0] or buf[4096].

For line 19 ~ 25 will do reload part of the flush-reload attack.

The code will read r[0] and get time by b - a.

Then read r[4096] and get time takes by c - b.

if b - a < c - b means r[0] has been read recently and then we can imply r2 == 0, so the lowest bit of address r1 is 0.

That’s how we read one bit of kernel address in user mode.

Additional

Meltdown only affects some of Intel x86 CPU and ARM CPU and has been fixed.

Meltdown can get kernel memory relies on user address and kernel address are in same page table.

OS can fix it by “KAISER”/”KPTI”, it doesn’t map the kernel in user page table.

And L1 cache cached virtual address and L3 cache cached physical address.

When CPU load memory into cache, it loads a trunk of memory. That’s why we use ` buf[0] and buf[4096]` for checking.

This article mainly based on 6.S081 2020 Lecture 22 and the meltdown paper.

For how hardware handle user mode and kernel mode, you can read “The RISC-V Reader” chapter 10.

Retired explains in stack overflow https://stackoverflow.com/questions/22368835/what-does-intel-mean-by-retired

Others explains about Meltdown https://www.bilibili.com/video/BV1nb4y1D7ii?from=search&seid=12911120057259798460

Please let me know if I make any mistakes :pray: