Loading a core dump

One of the big limitations of micro-execution is that it is blind to any kind of state built up by the program before your start point. Normally, the harness has to build up that state explicitly.

However, there is another option, and that’s to load a memory dump from a live program.

Linux and similar platforms can automatically produce “core dumps” on certain kinds of program failures, or command a debugger to produce a similar file.

This file is a modified ELF that contains the memory image of the faulting process, as well as the values of registers at the time of the fault. Perfect for setting up a harness!

Note

Windows can produce similar memory dumps. SmallWorld does not yet support loading them.

Consider the program elf_core.elf.c:

#include <stdio.h>
#include <stdlib.h>
#include <sys/procfs.h>

char input_word[64];
int x = 1;

int main() {

    if(fscanf(stdin, "%63s", input_word) != 1) {
        return 1;
    }
    for(int i = 0; i < 2; i++) {
        if (i == 1) {
            __builtin_trap();
        }
    }
    puts(input_word);
}

The __builtin_trap() statement puts some kind of illegal or faulting instruction between the fscanf and the puts. We want to use that as our starting point for emulation, and just explore the results of the puts, without having to deal with the machinations of fscanf.

Note

In practice, a program won’t put a nice __builtin_trap() right where you need it. You will need to modify the binary to inject a faulting instruction yourself.

You can turn this program into a core dump as follows:

# Build the test program
cd smallworld/tests
make elf_core/elf_core.amd64.elf

# Enable core dumps
ulimit -c unlimited

# Create the core dump
cd smallworld/tests/elf_core
make elf_core.amd64.elf.core

In order to build a harness around this file, we need to do the following:

  • Follow the metadata in the core file to unpack the memory image inside.

  • Load the registers from the core dump.

  • Avoid or replace the trapping instruction so we can keep emulating.

Warning

Generating these core dumps requires a good bit of setup.

The Makefile assumes the default GCC targets amd64 linux. It also needs user-mode QEMU and multiarch gdb to actuate the program.

If you are running SmallWorld in a container, ulimit -c will probably fail; Docker and similar environments do not let containers modify ulimits internally. Most platforms have some mechanism for setting ulimits when you launch a container. See your specific container solution’s documentation for details.

If you’re running on an IT-managed system, core dumps may be redirected to a reporting tool. You will need to alter /proc/sys/kernel/core_pattern to contain the following:

core

By default, Linux does not dump file-backed pages, such as the .text sections of a binary. In theory, setting /proc/${pid}/coredump_filter to “0x3f” should fix this problem, but it didn’t work in testing.

The SmallWorld team advocates consulting your IT team before elevating container privileges or altering auditing facilities. Please harness responsibly.

Loading a core dump

Core files are just slightly extended ELFs. We can take a look at our core dump with readelf -l:

$ readelf -l elf_core.amd64.elf.core

Elf file type is CORE (Core file)
Entry point 0x0
There are 23 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x0000000000000548 0x0000000000000000 0x0000000000000000
                 0x0000000000000344 0x0000000000000000         0x0
  LOAD           0x0000000000001000 0x00007fffedf19000 0x0000000000000000
                 0x0000000000103000 0x0000000000103000  RW     0x1000
  LOAD           0x0000000000104000 0x00007fffee01c000 0x0000000000000000
                 0x0000000000028000 0x0000000000028000  R      0x1000
  LOAD           0x000000000012c000 0x00007fffee044000 0x0000000000000000
                 0x0000000000181000 0x0000000000181000  R E    0x1000
  LOAD           0x00000000002ad000 0x00007fffee1c5000 0x0000000000000000
                 0x000000000004f000 0x000000000004f000  R      0x1000
  LOAD           0x00000000002fc000 0x00007fffee214000 0x0000000000000000
                 0x0000000000004000 0x0000000000004000  R      0x1000
  LOAD           0x0000000000300000 0x00007fffee218000 0x0000000000000000
                 0x0000000000002000 0x0000000000002000  RW     0x1000
  LOAD           0x0000000000302000 0x00007fffee21a000 0x0000000000000000
                 0x000000000000d000 0x000000000000d000  RW     0x1000
  LOAD           0x000000000030f000 0x00007fffee227000 0x0000000000000000
                 0x0000000000004000 0x0000000000004000  RW     0x1000
  LOAD           0x0000000000313000 0x00007fffee22b000 0x0000000000000000
                 0x0000000000000000 0x0000000000001000  R E    0x1000
  LOAD           0x0000000000313000 0x00007fffee22c000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R      0x1000
  LOAD           0x0000000000314000 0x00007fffee22d000 0x0000000000000000
                 0x000000000002b000 0x000000000002b000  R E    0x1000
  LOAD           0x000000000033f000 0x00007fffee258000 0x0000000000000000
                 0x000000000000b000 0x000000000000b000  R      0x1000
  LOAD           0x000000000034a000 0x00007fffee263000 0x0000000000000000
                 0x0000000000002000 0x0000000000002000  R      0x1000
  LOAD           0x000000000034c000 0x00007fffee265000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     0x1000
  LOAD           0x000000000034d000 0x00007fffee266000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     0x1000
  LOAD           0x000000000034e000 0x00007fffee267000 0x0000000000000000
                 0x0000000000000000 0x0000000000001000         0x1000
  LOAD           0x000000000034e000 0x00007fffee268000 0x0000000000000000
                 0x0000000000800000 0x0000000000800000  RW     0x1000
  LOAD           0x0000000000b4e000 0x00007fffeea68000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R      0x1000
  LOAD           0x0000000000b4f000 0x00007fffeea69000 0x0000000000000000
                 0x0000000000000000 0x0000000000001000  R E    0x1000
  LOAD           0x0000000000b4f000 0x00007fffeea6a000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  R      0x1000
  LOAD           0x0000000000b50000 0x00007fffeea6b000 0x0000000000000000
                 0x0000000000001000 0x0000000000001000  RW     0x1000
  LOAD           0x0000000000b51000 0xffffffffff600000 0x0000000000000000
                 0x0000000000000000 0x0000000000001000    E    0x1000

The NOTE segment will contain the register map. The LOAD segments define blocks of memory from the core file that must get loaded at specific addresses in memory to rebuild the original program’s memory image.

SmallWorld includes an extended version of its ELF loader that will also extract the register map. You can access this via Executable.from_elf_core():

with open(filename, "rb") as f:
    # Load the core dump
    code = smallworld.state.memory.code.Executable.from_elf_core(f, platform=platform)
    machine.add(code)

As with normal ELFs, the loader uses the platform argument to verify that the expected platform is being loaded. The harness can leave that argument blank, and the platform property of the ElfCoreExecutable object will contain the platform derived by the loader.

Unlike normal ELFs, core files are always fixed-position.

Applying registers

Actually applying register state from a core dump to a CPU is extremely straightforward, but must be done explicitly in a harness:

code.populate_cpu(cpu)

Avoiding the trap

If we were to start emulating now, we’d get the same illegal instruction trap that killed the program to begin with.

There are two possible approaches to solve this.

Stepping past the trap

If the trapping instruction was part of the original program, and we know we will never pass this point again, we can simply advance the program counter past it.

SmallWorld doesn’t include a facility for disassembling code directly, but there are already very good tools to get us the information we need.

For this demo, the Makefile produced a second file elf_core.amd64.elf.registers that contains the register values at the time of the original crash:

rax            0x1                 1
rbx            0x0                 0
rcx            0x0                 0
rdx            0x0                 0
rsi            0x0                 0
rdi            0x0                 0
rbp            0x7fffeea64db0      0x7fffeea64db0
rsp            0x7fffeea64da0      0x7fffeea64da0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x7fffeea64ed8      140737197264600
r13            0x1                 1
r14            0x7fffee265000      140737188876288
r15            0x7fffee2662f0      140737188881136
rip            0x7fffeea695bf      0x7fffeea695bf <main+79>
eflags         0x246               [ PF ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7fffee019740      140737186469696
gs_base        0x0                 0

Here, we see that rip is 0x7ffff6a7a55c. Now, if we remember the output of readelf -l, the file size of that segment was zero. By default, core dumps don’t include the executable segments of code

$ objdump -d elf_core.amd64.elf.core | grep '5b8:'
    7fffee04a5b8:	31 ff                	xor    %edi,%edi
    7fffee0505b8:	4c 8b 6d b0          	mov    -0x50(%rbp),%r13
    7fffee0545b8:	80 f9 ff             	cmp    $0xff,%cl
    7fffee0565b8:	48 63 ff             	movslq %edi,%rdi
    7fffee0605b8:	4d 39 e7             	cmp    %r12,%r15
    7fffee0615b8:	40 0f 95 c6          	setne  %sil
    7fffee0625b8:	75 7d                	jne    0x7fffee062637
    7fffee0645b8:	48 c1 f8 02          	sar    $0x2,%rax
    7fffee0675b8:	41 b9 01 00 00 00    	mov    $0x1,%r9d
    7fffee06b5b8:	49 8d 5a 01          	lea    0x1(%r10),%rbx
    7fffee0705b8:	41 8b 04 84          	mov    (%r12,%rax,4),%eax
    7fffee0755b8:	4a 8d 44 02 ff       	lea    -0x1(%rdx,%r8,1),%rax
    7fffee07a5b8:	c3                   	ret
    7fffee07b5b8:	0f 29 55 80          	movaps %xmm2,-0x80(%rbp)
    7fffee07c5b8:	c3                   	ret
    7fffee0855b8:	44 8b 8d 48 fb ff ff 	mov    -0x4b8(%rbp),%r9d
    7fffee08c5b8:	48 8b bd 88 f9 ff ff 	mov    -0x678(%rbp),%rdi
    7fffee0945b8:	0f 87 32 02 00 00    	ja     0x7fffee0947f0
    7fffee0965b8:	0f 84 77 08 00 00    	je     0x7fffee096e35
    7fffee0975b8:	45 31 c0             	xor    %r8d,%r8d
    7fffee09f5b8:	48 89 d0             	mov    %rdx,%rax
    7fffee0a25b8:	c7 07 01 00 00 00    	movl   $0x1,(%rdi)
    7fffee0a35b8:	48 8b 05 21 48 17 00 	mov    0x174821(%rip),%rax        # 0x7fffee217de0
    7fffee0a85b8:	48 39 ca             	cmp    %rcx,%rdx
    7fffee0aa5b8:	c9                   	leave
    7fffee0ac5b8:	41 56                	push   %r14
    7fffee0ae5b8:	31 d2                	xor    %edx,%edx
    7fffee0b25b8:	81 e6 00 80 00 00    	and    $0x8000,%esi
    7fffee0b45b8:	75 ae                	jne    0x7fffee0b4568
    7fffee0bd5b8:	83 7f 08 00          	cmpl   $0x0,0x8(%rdi)
    7fffee0c05b8:	a8 02                	test   $0x2,%al
    7fffee0c35b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    7fffee0c95b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    7fffee0cc5b8:	48 21 cf             	and    %rcx,%rdi
    7fffee0cf5b8:	c3                   	ret
    7fffee0d35b8:	89 c8                	mov    %ecx,%eax
    7fffee0d45b8:	45 31 c0             	xor    %r8d,%r8d
    7fffee0d75b8:	48 83 ef c0          	sub    $0xffffffffffffffc0,%rdi
    7fffee0d95b8:	48 01 cf             	add    %rcx,%rdi
    7fffee0dc5b8:	66 0f 6f 0e          	movdqa (%rsi),%xmm1
    7fffee0dd5b8:	f7 c2 00 e0 00 00    	test   $0xe000,%edx
    7fffee0e25b8:	f7 c2 00 fe 00 00    	test   $0xfe00,%edx
    7fffee0e35b8:	c3                   	ret
    7fffee0e45b8:	75 16                	jne    0x7fffee0e45d0
    7fffee0e55b8:	74 26                	je     0x7fffee0e55e0
    7fffee0eb5b8:	49 83 c1 18          	add    $0x18,%r9
    7fffee0ee5b8:	48 89 c7             	mov    %rax,%rdi
    7fffee0f85b8:	0f 84 03 e3 ff ff    	je     0x7fffee0f68c1
    7fffee1015b8:	48 89 95 78 ff ff ff 	mov    %rdx,-0x88(%rbp)
    7fffee1035b8:	41 0f b6 41 01       	movzbl 0x1(%r9),%eax
    7fffee1075b8:	4c 89 db             	mov    %r11,%rbx
    7fffee1085b8:	4c 89 f7             	mov    %r14,%rdi
    7fffee10c5b8:	5d                   	pop    %rbp
    7fffee10f5b8:	48 63 f6             	movslq %esi,%rsi
    7fffee1125b8:	48 83 c2 02          	add    $0x2,%rdx
    7fffee1135b8:	48 8d 7b 02          	lea    0x2(%rbx),%rdi
    7fffee1155b8:	48 8d 35 28 e4 0c 00 	lea    0xce428(%rip),%rsi        # 0x7fffee1e39e7
    7fffee11c5b8:	89 c3                	mov    %eax,%ebx
    7fffee1265b8:	48 83 c4 20          	add    $0x20,%rsp
    7fffee12c5b8:	49 89 06             	mov    %rax,(%r14)
    7fffee12f5b8:	85 c0                	test   %eax,%eax
    7fffee1335b8:	48 83 ec 50          	sub    $0x50,%rsp
    7fffee1355b8:	79 16                	jns    0x7fffee1355d0
    7fffee1365b8:	31 c0                	xor    %eax,%eax
    7fffee13b5b8:	49 83 fc 02          	cmp    $0x2,%r12
    7fffee13c5b8:	0f 84 32 e2 ff ff    	je     0x7fffee13a7f0
    7fffee13d5b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    7fffee1415b8:	31 ff                	xor    %edi,%edi
    7fffee1435b8:	48 63 ff             	movslq %edi,%rdi
    7fffee1445b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    7fffee1455b8:	89 f8                	mov    %edi,%eax
    7fffee1475b8:	74 1f                	je     0x7fffee1475d9
    7fffee1485b8:	41 89 c1             	mov    %eax,%r9d
    7fffee1495b8:	e8 33 ae ef ff       	call   0x7fffee0443f0
    7fffee14a5b8:	72 15                	jb     0x7fffee14a5cf
    7fffee14c5b8:	ba 05 00 00 00       	mov    $0x5,%edx
    7fffee14e5b8:	48 8d 15 e1 8b f8 ff 	lea    -0x7741f(%rip),%rdx        # 0x7fffee0d71a0
    7fffee15b5b8:	41 83 f8 04          	cmp    $0x4,%r8d
    7fffee15c5b8:	0f 82 82 00 00 00    	jb     0x7fffee15c640
    7fffee15e5b8:	0f 84 df 00 00 00    	je     0x7fffee15e69d
    7fffee15f5b8:	48 8d bd 10 fe ff ff 	lea    -0x1f0(%rbp),%rdi
    7fffee1615b8:	31 f6                	xor    %esi,%esi
    7fffee1745b8:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    7fffee1755b8:	4c 89 95 e8 f6 ff ff 	mov    %r10,-0x918(%rbp)
    7fffee1775b8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    7fffee17f5b8:	c3                   	ret
    7fffee1845b8:	b8 01 00 00 00       	mov    $0x1,%eax
    7fffee1875b8:	45 85 e4             	test   %r12d,%r12d
    7fffee18c5b8:	48 8b bb a0 00 00 00 	mov    0xa0(%rbx),%rdi
    7fffee18d5b8:	4a 8b 3c e0          	mov    (%rax,%r12,8),%rdi
    7fffee1905b8:	b8 01 00 00 00       	mov    $0x1,%eax
    7fffee1955b8:	bf 05 00 00 00       	mov    $0x5,%edi
    7fffee1965b8:	ba 20 00 00 00       	mov    $0x20,%edx
    7fffee1975b8:	48 85 c0             	test   %rax,%rax
    7fffee19f5b8:	c4 41 7d fc c2       	vpaddb %ymm10,%ymm0,%ymm8
    7fffee1a05b8:	41 89 c1             	mov    %eax,%r9d
    7fffee1a25b8:	c5 fe 7f 4f 01       	vmovdqu %ymm1,0x1(%rdi)
    7fffee1a45b8:	8d b4 0f 81 00 00 00 	lea    0x81(%rdi,%rcx,1),%esi
    7fffee1a65b8:	c4 c1 7d fc d0       	vpaddb %ymm8,%ymm0,%ymm2
    7fffee1ab5b8:	48 01 f8             	add    %rdi,%rax
    7fffee1ad5b8:	01 00 00 00 
    7fffee1ae5b8:	74 cd                	je     0x7fffee1ae587
    7fffee1b25b8:	48 8d 04 87          	lea    (%rdi,%rax,4),%rax
    7fffee1b55b8:	00 00 00 00 
    7fffee1ba5b8:	83 f9 06             	cmp    $0x6,%ecx
    7fffee1c15b8:	4d 89 cf             	mov    %r9,%r15
    7fffee1c35b8:	31 db                	xor    %ebx,%ebx
    7fffee22d5b8:	ba 9c 01 00 00       	mov    $0x19c,%edx
    7fffee22e5b8:	41 55                	push   %r13
    7fffee2335b8:	31 ff                	xor    %edi,%edi
    7fffee2355b8:	49 8b 84 24 e8 02 00 	mov    0x2e8(%r12),%rax
    7fffee2365b8:	44 8b ad 2c fc ff ff 	mov    -0x3d4(%rbp),%r13d
    7fffee23a5b8:	00 00 00 00 
    7fffee23e5b8:	4c 8d 71 0b          	lea    0xb(%rcx),%r14
    7fffee2415b8:	4c 8b ad 78 ff ff ff 	mov    -0x88(%rbp),%r13
    7fffee2485b8:	8b 1d 82 c5 01 00    	mov    0x1c582(%rip),%ebx        # 0x7fffee264b40
    7fffee24f5b8:	00 
    7fffee2525b8:	45 31 d2             	xor    %r10d,%r10d
    7fffee2545b8:	48 c7 c1 10 00 00 00 	mov    $0x10,%rcx
    7fffee2555b8:	66 0f da 40 10       	pminub 0x10(%rax),%xmm0
    7fffee2565b8:	00 00 00 00

Our trap is thanks to a ud2 instruction, which is two bytes long. We could repair the program counter thusly:

entrypoint = cpu.rip.get() + 2
cpu.rip.set(entrypoint)

Removing the trap

If we expect to encounter this code again, or if we had to replace an existing instruction in the program in order to add our trap, simply advancing the program counter won’t help. We will need to rewrite the instruction bytes to remove the trap entirely.

If we replaced an existing instruction, we will know the specific bytes we want to write back. Otherwise, we will need to use the disassembler as above to learn the number of bytes to rewrite.

In this example, we know the ud2 instruction is two bytes long, and it did not replace an existing instruction, so we want to replace it with a two-byte NOP.

In SmallWorld, Executables are just memory, so we can use the bytes accessors from the Memory class to perform our modification quickly:

# 2-byte x86 NOP.
nop = b'\x66\x90'
code.write_bytes(cpu.rip.get(), nop)

Note

Some ISAs present instructions in native byte order, so an instruction on a little-endian system will appear backwards in memory. Be careful of this when manually rewriting code.

Putting it all together

Combined, this can be found in the script actuate/elf_core.amd64.py:

import logging
import pathlib

import smallworld

# Set up logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)

# Define the platform
platform = smallworld.platforms.Platform(
    smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)

# Create a machine
machine = smallworld.state.Machine()

# Create a CPU
cpu = smallworld.state.cpus.CPU.for_platform(platform)
machine.add(cpu)

# Load and add core file into the state
filepath = pathlib.Path(__file__).resolve()
filename = (
    filepath.name.replace(".py", ".elf.core")
    .replace(".angr", "")
    .replace(".panda", "")
    .replace(".pcode", "")
)
filename = (filepath.parent.parent / filename).as_posix()
with open(filename, "rb") as f:
    code = smallworld.state.memory.code.Executable.from_elf_core(f, platform=platform)
    machine.add(code)
    code.populate_cpu(cpu)

# Load the original binary so we can copy .text
# I can't get my system to dump the executable segments.
origname = filename.replace(".core", "")
with open(origname, "rb") as f:
    orig = smallworld.state.memory.code.Executable.from_elf(
        f, platform=platform, address=code.address
    )

# The core file reserves space before the true load address for its metadata.
code_offset = (cpu.pc.get() - code.address) & 0xFFFFFFFFFFFFF000
code[code_offset] = orig[0x0]

# Replace the instruction bytes at rip with a nop
nop = b"\x66\x90"
code.write_bytes(cpu.rip.get(), nop)

# Set up a puts handler
# puts address recovered from manual RE
puts_addr = (cpu.rip.get() & 0xFFFFFFFFFFFFF000) | 0x610
puts = smallworld.state.models.Model.lookup(
    "puts", platform, smallworld.platforms.ABI.SYSTEMV, puts_addr
)
machine.add(puts)

# Add an exit point
machine.add_exit_point(cpu.rip.get() + 0x23)

# Emulate
emulator = smallworld.emulators.UnicornEmulator(platform)
machine.emulate(emulator)

Aside from loading the core dump and patching bytes, the only other step is to hook puts; we may have the process memory, but we don’t have a system call model.

Here is what running the harness looks like:

$ python3 elf_core.amd64.py
The current binary doesn't have a section header
[+] starting emulation at 0x7fffeea695bf
[+] emulation complete
foobar

Handling missing segments

As mentioned above, Linux won’t dump file-backed read-only segments; it assumes you already have the data.

(There is theoretically way to enable these segments in dumps, but it may be blocked by other security policies.)

If this happens, we can replace the data by loading the executable segment out of our original binary:

origname = filename.replace(".core", "")
with open(origname, "rb") as f:
    orig = smallworld.state.memory.code.Executable.from_elf(f, platform=platform, address=code.address)

# The core file reserves space before the true load address for its metadata.
#
# NOTE: You may need to adjust the masking depending on the segment's offset.
code_offset = (cpu.pc.get() - code.address) & 0xfffffffffffff000
code[code_offset] = orig[0x0]

When tested, this wasn’t the case for amd64 binaries, but it was the case for all other architectures. See any other example elf_core_actuate.$ARCH.py for examples of this working.

Known Limitations

On some platforms, core dumps exercise features that not all emulators support.

  • amd64 core dumps include the rflags register, which angr doesn’t represent directly

  • i386 core dumps include the eflags register, which angr doesn’t represent directly

  • i386 core dumps include segment registers, which require complex setup for Panda and Unicorn

Additionally, extracting the full machine state from Ghidra or angr can take a while. If you don’t need the final machine state, use Machine.apply() paired with emulator.run().