Local Types and the Heap

Introduction

In this tutorial, you will learn how to handle struct data types and allocate heap memory in SmallWorld. To demonstrate, we will examine the AMD64 assembly below. This code iterates over a doubly-linked list, stops on the first node marked as empty, and overwrites its data with a given input value. We will use SmallWorld to construct a doubly-linked list on the heap as an input to this code. Then, we will observe the output state to confirm that it worked as expected.

BITS 64;
;typedef struct node {
;    int data;
;    node* next;
;    node* prev;
;    int empty;
;};

;void function(node* n, int a) {
;    node* curr = n;
;    while (!(curr -> empty)) {
;        if (curr -> data % 2 == 0){
;            curr = curr -> next;
;        } else {
;            curr = curr -> prev;
;        }
;    }
    
;    curr -> data = a;
;    return;
;}


function:
        jmp     .L7
.L4:
        mov     rax, QWORD [rdi+8]
        test    BYTE [rdi], 1
        cmovne  rax, QWORD [rdi+16]
        mov     rdi, rax
.L7:
        mov     eax, DWORD [rdi+24]
        test    eax, eax
        je      .L4
        mov     DWORD [rdi], esi

Local Types

The example code defines a struct type node which may be chained together to form a doubly-linked list. Each node contains some integer data, a pointer to the next element in the list, a pointer to the prev element in the list, and an integer flag to mark the node as empty.

When analyzing this program, we may want to read or manipulate the contents of nodes using their type interface. However, when the code is compiled, this type information is replaced with hard-coded offsets from the base address of a given node. In order to interact with structured data, we need to define its shape in our harness.

In SmallWorld, we accomplish this using the ctypes library. We begin by creating a new class, StructNode, which extends ctypes’ LittleEndianStructure. We set the _pack_ field to 8 to indicate that struct fields should be aligned to 8 bytes. Note that the endianness and struct packing size are platform-dependent.

class StructNode(ctypes.LittleEndianStructure):
    _pack_ = 8

Next, we define the fields of the node structure. Each field is defined as a tuple consisting of the field’s name and type. We include a "padding_0x4" field after the 4-byte "data" field to ensure that the "next" field is properly 8-byte aligned. For pointer types ("next" and "prev"), we use the create_typed_pointer helper function in SmallWorld.

StructNode._fields_ = [
    ("data", ctypes.c_int32),
    ("padding_0x4", ctypes.c_int32),
    ("next", smallworld.extern.ctypes.create_typed_pointer(StructNode)),
    ("prev", smallworld.extern.ctypes.create_typed_pointer(StructNode)),
    ("empty", ctypes.c_int32),
]

Now that we have our structure’s memory layout defined, we can create StructNode objects and set their fields like any other Python object. In the next section, we will learn how to allocate these objects on the heap in SmallWorld.

Heap Allocators

When harnessing a binary, it may be expected that some memory has already been allocated on the heap. In the case of our AMD64 example above, each node of the doubly-linked list must already be allocated on the heap with the prev and next fields pointing to its neighbors. Additionally, the rdi register must point to the head of the list.

In SmallWorld, a Heap is a Memory object with an interface for dynamic allocation. See Heaps to learn more about heaps in SmallWorld.

For this example, we will use a BumpAllocator heap. The snippet below constructs a new BumpAllocator with an address of 0x4000 and a size of 0x4000.

heap = smallworld.state.memory.heap.BumpAllocator(0x4000, 0x4000)

We can use our heap to allocate StructNode objects in SmallWorld. First, we convert our StructNode into a Value by using Value.from_ctypes(). Then, we use heap.allocate() to place our Value on the heap. This method returns the memory address of the allocated node, which can then be used to link the nodes together through their next and prev pointers.

node_a = StructNode()
node_a.data = 2
node_a.empty = 0
node_a.prev = 0
node_a_addr = heap.allocate(smallworld.state.Value.from_ctypes(node_a, "node a"))

node_b = StructNode()
node_b.data = 1
node_b.empty = 1
node_b.next = 0
node_b_addr = heap.allocate(smallworld.state.Value.from_ctypes(node_b, "node b"))

node_a.next = node_b_addr
node_b.prev = node_a_addr

Completing Our Harness

We have now defined the memory layout of our struct type, created a few linked node objects, and allocated those objects on our heap. The final step is to set the inputs of the function and observe the outputs.

First, we set the rdi register to the address of node A, the head of the linked list. Then, we set rsi to an arbitrary value, in this case 42, and expect it to appear in the data of the final node in our output state.

cpu.rdi.set_content(node_a_addr)
cpu.rsi.set_content(42)

We add the lines below to step through our harness and observe the final CPU state. We expect rdi to iterate through the list and point to node B at the end. We also expect that the data field of node B will contain 42. To verify this, we extract the heap memory from our emulator and use ctypes to create a new StructNode from the bytes at node B’s address.

*_, final_machine = machine.step(emulator)

final_cpu = final_machine.get_cpu()
print(f"curr = {hex(final_cpu.rdi.get())}")

heap.extract(emulator)
final_node_b_bytes = heap.read_bytes(node_b_addr, ctypes.sizeof(StructNode))
final_node_b = StructNode.from_buffer_copy(final_node_b_bytes)
print(f"node_b->data = {final_node_b.data}")

Now, running our harness, we can see that rdi has been advanced to point to node B and its data field has been set to 42. Our harness successfully operated on our structured data.

$ python3 struct.amd64.py
emulation complete; encountered exit point or went out of bounds
curr = 0x4020
node_b->data = 42