As all careful readers of this blog certainly know, the Bromium vSentry hypervisor (uXen) has been derived from Xen. It means parts of the codebase are shared between the two projects, and vulnerabilities found in Xen sometimes are relevant for uXen. The two recent Xen Security Advisories, XSA-105 and XSA-108, are not particularly severe (at least for vSentry), but feature interesting details related to generic hypervisor hardening that are worth discussing. One may wish to read the original Xen advisories before proceeding.
The title of advisory is “Missing privilege level checks in x86 HLT, LGDT, LIDT, and LMSW emulation”. The impact (for Xen) is ability of an unprivileged VM usermode to elevate to VM kernel.
In some scenarios when CPU cannot execute an instruction in a VM properly (because e.g. this instruction touches memory-mapped register, and has device-specific side effect) Xen emulates the instruction. The problem is that the code responsible for emulating the above instructions did not check whether CPU was in kernel mode. Particularly LGDT and LIDT are normally available for kernel mode only, as they change the crucial CPU registers. Because of the vulnerability, user process in a VM (even one with very low privileges, e.g. Untrusted integrity level in case
of Windows) could effectively execute LIDT or LGDT and take full control over VM.
Exploitation is straightforward in case of Windows 7, one can just create a fake IDT table in usermode, and kernel will transfer control to attacker’s code (residing in usermode pages) upon the first interrupt. On Windows 8 running on CPU featuring SMEP, attacker needs a bit more work and create a ROP chain in kernel – fortunately for an attacker, at the entry to the [software] interrupt handler, all general-purpose registers are controllable, so it is easy to achieve the stack pivot.
It is remarkable that in fact, no sane OS needs support of emulation of these instructions in normal circumstances. Still, a complete emulator imported into Xen is available throughout VM’s lifetime, resulting in a vulnerability. In the early days of uXen development, it was recognized that the emulator constitutes an attack vector, and a conscious effort was made to reduce the number of supported instructions. Therefore, uXen is not vulnerable – when an exploit is run in a vSetry microVM, the emulation is denied (with a message
(uXEN) c:/br/bld/uxen/xen/uxen/arch/x86/x86_emulate/x86_emulate.c:1383:d187 instruction emulation restricted for twobyte-instruction 0x1
in the logs) and the microVM is killed.
To sum up, Xen users should worry about this vulnerability if they run untrusted code in their VMs (think sandboxed code) and care about privilege elevation within VM. uXen is not affected.
The title of the advisory is “Improper MSR range used for x2APIC emulation”. The impact is that a malicious VM kernel can crash Xen or read up to 3K of its memory, from an address that is not under control of an attacker.
The root cause is that the code responsible for emulation of access to local APIC registers in x2APIC mode supported 1024 registers, but allocated buffer space for 256 registers only. If a write access (by wrmsr instruction) is requested by VM, no harm is done, as only a limited number of known registers are actually emulated. On the other hand, the code implementing read access emulation just reads from the vlapic->regs buffer (that is one page long), at an offset controlled by the attacker (must be less than 16K).
Consequently, memory located up to 12K after the vlapic->regs buffer is read and returned to the VM. More precisely, 4byte-long integers located at 16bytes-aligned addresses can be read. If the virtual addresses adjacent to vlapic->regs buffer are unmapped, this results in Xen crash; if they are mapped, their contents leak to the VM.
The vulnerable code is present in uXen. uXen uses a specialized memory manager (dubbed memcache”), that preallocates a large contiguous virtual memory range for the purpose of mapping VM-related pages. As a result, uXen crash is unlikely, it can happen only when the vlapic->regs buffer is mapped near the end of the memcache.
Similarly, the information leak is somewhat restricted – memcache stores only pages allocated for uXen purposes, therefore (if we neglect the unlikely “end of memcache” scenario) there is no possibility that unrelated host’s kernel memory can leak to the microVM. In the common case, memcache assigns consecutive virtual addresses for mapping of subsequent page allocations. During microVM setup, the order of allocation is such that the three pages allocated
immediately after vlapic->regs allocation store VMCS, VMCS shadow and MSR bitmap pages. Therefore, in the common case, all the atacker can achieve is leaking lower 32 bits of pointers from VMCS, which might help to deduce the ALSR layout of the host kernel. This is not a catastrophic problem in itself, but it can aid in exploitation of another unrelated vulnerability. In a corner case when microVM creation races with heavy map/unmap operations done on other microVM’s memory, this memory would leak to the attacker as well.
To sum up, this vulnerability has potential for crashing the whole hypervisor or leaking limited amount of data from hypervisor.. This is not very severe impact, although if one runs multiple VMs of different origin on the same host and is very serious about possibility of leaking data (even small amount from an location not controlled by an attacker) from one VM to another, prompt patching is justified. Interestingly, there was quite some concern in the media about this vulnerability, but it was clearly overhyped.
Interestingly, vSentry microVMs use xAPIC mode, not in x2APIC mode. The vulnerability can be exploited only in x2APIC mode. It means that an attacker needs to enable x2APIC mode first. However, this results in microVM OS being unable to use APIC, and hang in IPI processing. In order to exploit this vulnerability repeatedly for more than a few seconds, attacker would need to patch VM OS to use APIC in x2APIC mode, which is far from trivial, yet imaginable.
It also means we missed a generic hardening opportunity – we should support only a single APIC mode. There is still room for improvement, but considering that since the release of the first vSentry version there was no vulnerability in Xen allowing for escape from VM that would affect us, it looks we have done a fairly decent job.