Meltdown can let us read kernel memory from user space which breaks OS’ isolation.
That means we can use meltdown to read some privilege data such I/O or network buffer.
Before explain how it works let us to see how OS prevent such things happen.
Memory Isolation
In user mode data can be accessed only through virtual memory.
Virtual memory will map virtual address to real(physical) address.
Before hardware do the mapping will check permission first, if some memory belongs to kernel that memory can’t be read in user mode.
If program dose such thing will cause page fault exception.
The permission is recored by PTE(page table entry).
Below picture is RISC-V’s PTE.

If the U bit is set to 1 means the memory belongs to user.
Only when program in kernel mode can create or modify the PTE.
This is how OS provide memory isolation.
Out-of-order Execution
Out-of-order execution can let us to read kernel memory from user model but the result is invisible in architectural level.
When CPU run some slow instructions such as load from memory, CPU will run the following instructions first for performance reason.
Before the slow instruction is finished, the following instructions’ changes are invisible in architectural level. This is out-of-order execution.
Then after the slow instruction has finished, then CPU will make the state changes visible or rollback if some things go wrong. They call this retirement.
In out-of-order execution hardware will not check exception such as page fault until when CPU do retirement.
let’s see one example:
1 r0 = <something>
2 r1 = valid // r1 is a register; valid is in RAM
3 if(r1 == 1){
4 r2 = *r0
5 r3 = r2 + 1
6 } else {
7 r3 = 0
8 }
For line 1 needs to load value from memory to register and that may take hundreds of cycles.
CPU will not wait idle for the load and it will run lien 4 ~ 5 in parallel.
Suppose r0 is some kernel address, we know when CPU exec instructions out of order, it will not check exceptions.
So r0’s value has been loaded in some invisible place.
After r1 has been loaded, CPU will do retirement. Hardware will see current mode is user mode and the program is reading a kernel address.
So we should rollback changes and raise page fault exception.
Hardware designer thought that kind of changes are invisible to program.
Flush-Reload
In the Out-of-order execution section, we demonstrated how to read kernel memory in user mode but the result is invisible to us.
In this section will see how to make the result visible.
Cache
CPU has cache for storing recent accessed data.
The speed for CPU to read data from cache is much more faster than read data from RAM.
For example, read from cache needs a dozen cycles, read from RAM needs hundreds cycles.
If we read some address fast means the address has been used recently and slow means it has not been used recently.
Flush-Reload
Flush-reload will allow you to check a function used the memory at an address x or not.
The steps are:
- ensure x is not cached
- call
f() - record the time takes, say t1
- load a byte from address x
- record the time again, say t2
- if the difference between t1 and t2 is small we can infer x has been used in
f()
When CPU do retirement, it will cancel register state but will not flush the cache.
So we can use flush+reload to get out-of-order execution’s internal state.
Meltdown
Let’s see how to read 1 bit at kernel address r1
1 char buf[8192]
2
3 // the Flush of Flush+Reload
4 clflush buf[0]
5 clflush buf[4096]
6
7 <some expensive instruction like divide>
8
9 r1 = <a kernel virtual address>
11 r2 = *r1 // out-of-order
12 r2 = r2 & 1 // out-of-order
13 r2 = r2 * 4096 // out-of-order
14 r3 = buf[r2] // out-of-order
15
16 <handle the page fault from "r2 = *r1">
17
18 // the Reload of Flush+Reload
19 a = rdtsc // get time
20 r0 = buf[0]
21 b = rdtsc
22 r1 = buf[4096]
23 c = rdtsc
24 if b-a < c-b:
25 low bit was probably a 0
As line 7 is an expensive instruction, CPU will execute lien 9 ~ 14 in parallel.
r2 = *r1 will read r1’s content.
Then get the lowest bit through r2 = r2 & 1.
r2 = r2 * 4096
r3 = buf[r2]
Will cause a read at buf[0] or buf[4096].
For line 19 ~ 25 will do reload part of the flush-reload attack.
The code will read r[0] and get time by b - a.
Then read r[4096] and get time takes by c - b.
if b - a < c - b means r[0] has been read recently and then we can imply r2 == 0, so the lowest bit of address r1 is 0.
That’s how we read one bit of kernel address in user mode.
Additional
Meltdown only affects some of Intel x86 CPU and ARM CPU and has been fixed.
Meltdown can get kernel memory relies on user address and kernel address are in same page table.
OS can fix it by “KAISER”/”KPTI”, it doesn’t map the kernel in user page table.
And L1 cache cached virtual address and L3 cache cached physical address.
When CPU load memory into cache, it loads a trunk of memory. That’s why we use ` buf[0] and buf[4096]` for checking.
This article mainly based on 6.S081 2020 Lecture 22 and the meltdown paper.
For how hardware handle user mode and kernel mode, you can read “The RISC-V Reader” chapter 10.
Retired explains in stack overflow https://stackoverflow.com/questions/22368835/what-does-intel-mean-by-retired
Others explains about Meltdown https://www.bilibili.com/video/BV1nb4y1D7ii?from=search&seid=12911120057259798460
Please let me know if I make any mistakes :pray: