Loading a core dump¶
One of the big limitations of micro-execution is that it is blind to any kind of state built up by the program before your start point. Normally, the harness has to build up that state explicitly.
However, there is another option, and that’s to load a memory dump from a live program.
Linux and similar platforms can automatically produce “core dumps” on certain kinds of program failures, or command a debugger to produce a similar file.
This file is a modified ELF that contains the memory image of the faulting process, as well as the values of registers at the time of the fault. Perfect for setting up a harness!
Note
Windows can produce similar memory dumps. SmallWorld does not yet support loading them.
Consider the program elf_core.elf.c:
#include <stdio.h>
#include <stdlib.h>
#include <sys/procfs.h>
char input_word[64];
int x = 1;
int main() {
if(fscanf(stdin, "%63s", input_word) != 1) {
return 1;
}
for(int i = 0; i < 2; i++) {
if (i == 1) {
__builtin_trap();
}
}
puts(input_word);
}
The __builtin_trap() statement
puts some kind of illegal or faulting instruction
between the fscanf and the puts.
We want to use that as our starting point for emulation,
and just explore the results of the puts,
without having to deal with the machinations of fscanf.
Note
In practice, a program won’t put a nice __builtin_trap()
right where you need it.
You will need to modify the binary
to inject a faulting instruction yourself.
You can turn this program into a core dump as follows:
# Build the test program
cd smallworld/tests
make elf_core/elf_core.amd64.elf
# Enable core dumps
ulimit -c unlimited
# Create the core dump
cd smallworld/tests/elf_core
make elf_core.amd64.elf.core
In order to build a harness around this file, we need to do the following:
Follow the metadata in the core file to unpack the memory image inside.
Load the registers from the core dump.
Avoid or replace the trapping instruction so we can keep emulating.
Warning
Generating these core dumps requires a good bit of setup.
The Makefile assumes the default GCC targets amd64 linux. It also needs user-mode QEMU and multiarch gdb to actuate the program.
If you are running SmallWorld in a container,
ulimit -c will probably fail; Docker and similar environments
do not let containers modify ulimits internally.
Most platforms have some mechanism for setting ulimits
when you launch a container.
See your specific container solution’s documentation for details.
If you’re running on an IT-managed system,
core dumps may be redirected to a reporting tool.
You will need to alter /proc/sys/kernel/core_pattern
to contain the following:
core
By default, Linux does not dump file-backed pages,
such as the .text sections of a binary.
In theory, setting /proc/${pid}/coredump_filter
to “0x3f” should fix this problem, but it didn’t work in testing.
The SmallWorld team advocates consulting your IT team before elevating container privileges or altering auditing facilities. Please harness responsibly.
Loading a core dump¶
Core files are just slightly extended ELFs.
We can take a look at our core dump with readelf -l:
$ readelf -l elf_core.amd64.elf.core
Elf file type is CORE (Core file)
Entry point 0x0
There are 23 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000000548 0x0000000000000000 0x0000000000000000
0x0000000000000344 0x0000000000000000 0x0
LOAD 0x0000000000001000 0x00007fffedf19000 0x0000000000000000
0x0000000000103000 0x0000000000103000 RW 0x1000
LOAD 0x0000000000104000 0x00007fffee01c000 0x0000000000000000
0x0000000000028000 0x0000000000028000 R 0x1000
LOAD 0x000000000012c000 0x00007fffee044000 0x0000000000000000
0x0000000000181000 0x0000000000181000 R E 0x1000
LOAD 0x00000000002ad000 0x00007fffee1c5000 0x0000000000000000
0x000000000004f000 0x000000000004f000 R 0x1000
LOAD 0x00000000002fc000 0x00007fffee214000 0x0000000000000000
0x0000000000004000 0x0000000000004000 R 0x1000
LOAD 0x0000000000300000 0x00007fffee218000 0x0000000000000000
0x0000000000002000 0x0000000000002000 RW 0x1000
LOAD 0x0000000000302000 0x00007fffee21a000 0x0000000000000000
0x000000000000d000 0x000000000000d000 RW 0x1000
LOAD 0x000000000030f000 0x00007fffee227000 0x0000000000000000
0x0000000000004000 0x0000000000004000 RW 0x1000
LOAD 0x0000000000313000 0x00007fffee22b000 0x0000000000000000
0x0000000000000000 0x0000000000001000 R E 0x1000
LOAD 0x0000000000313000 0x00007fffee22c000 0x0000000000000000
0x0000000000001000 0x0000000000001000 R 0x1000
LOAD 0x0000000000314000 0x00007fffee22d000 0x0000000000000000
0x000000000002b000 0x000000000002b000 R E 0x1000
LOAD 0x000000000033f000 0x00007fffee258000 0x0000000000000000
0x000000000000b000 0x000000000000b000 R 0x1000
LOAD 0x000000000034a000 0x00007fffee263000 0x0000000000000000
0x0000000000002000 0x0000000000002000 R 0x1000
LOAD 0x000000000034c000 0x00007fffee265000 0x0000000000000000
0x0000000000001000 0x0000000000001000 RW 0x1000
LOAD 0x000000000034d000 0x00007fffee266000 0x0000000000000000
0x0000000000001000 0x0000000000001000 RW 0x1000
LOAD 0x000000000034e000 0x00007fffee267000 0x0000000000000000
0x0000000000000000 0x0000000000001000 0x1000
LOAD 0x000000000034e000 0x00007fffee268000 0x0000000000000000
0x0000000000800000 0x0000000000800000 RW 0x1000
LOAD 0x0000000000b4e000 0x00007fffeea68000 0x0000000000000000
0x0000000000001000 0x0000000000001000 R 0x1000
LOAD 0x0000000000b4f000 0x00007fffeea69000 0x0000000000000000
0x0000000000000000 0x0000000000001000 R E 0x1000
LOAD 0x0000000000b4f000 0x00007fffeea6a000 0x0000000000000000
0x0000000000001000 0x0000000000001000 R 0x1000
LOAD 0x0000000000b50000 0x00007fffeea6b000 0x0000000000000000
0x0000000000001000 0x0000000000001000 RW 0x1000
LOAD 0x0000000000b51000 0xffffffffff600000 0x0000000000000000
0x0000000000000000 0x0000000000001000 E 0x1000
The NOTE segment will contain the register map.
The LOAD segments define blocks of memory from the core file
that must get loaded at specific addresses in memory to rebuild the
original program’s memory image.
SmallWorld includes an extended version of its ELF loader
that will also extract the register map.
You can access this via Executable.from_elf_core():
with open(filename, "rb") as f:
# Load the core dump
code = smallworld.state.memory.code.Executable.from_elf_core(f, platform=platform)
machine.add(code)
As with normal ELFs, the loader uses the platform argument
to verify that the expected platform is being loaded.
The harness can leave that argument blank,
and the platform property of the ElfCoreExecutable object
will contain the platform derived by the loader.
Unlike normal ELFs, core files are always fixed-position.
Applying registers¶
Actually applying register state from a core dump to a CPU
is extremely straightforward, but must be done explicitly in a harness:
code.populate_cpu(cpu)
Avoiding the trap¶
If we were to start emulating now, we’d get the same illegal instruction trap that killed the program to begin with.
There are two possible approaches to solve this.
Stepping past the trap¶
If the trapping instruction was part of the original program, and we know we will never pass this point again, we can simply advance the program counter past it.
SmallWorld doesn’t include a facility for disassembling code directly, but there are already very good tools to get us the information we need.
For this demo, the Makefile produced a second file elf_core.amd64.elf.registers
that contains the register values at the time of the original crash:
rax 0x1 1
rbx 0x0 0
rcx 0x0 0
rdx 0x0 0
rsi 0x0 0
rdi 0x0 0
rbp 0x7fffeea64db0 0x7fffeea64db0
rsp 0x7fffeea64da0 0x7fffeea64da0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x7fffeea64ed8 140737197264600
r13 0x1 1
r14 0x7fffee265000 140737188876288
r15 0x7fffee2662f0 140737188881136
rip 0x7fffeea695bf 0x7fffeea695bf <main+79>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fs_base 0x7fffee019740 140737186469696
gs_base 0x0 0
Here, we see that rip is 0x7ffff6a7a55c.
Now, if we remember the output of readelf -l,
the file size of that segment was zero.
By default, core dumps don’t include the executable segments
of code
$ objdump -d elf_core.amd64.elf.core | grep '5b8:'
7fffee04a5b8: 31 ff xor %edi,%edi
7fffee0505b8: 4c 8b 6d b0 mov -0x50(%rbp),%r13
7fffee0545b8: 80 f9 ff cmp $0xff,%cl
7fffee0565b8: 48 63 ff movslq %edi,%rdi
7fffee0605b8: 4d 39 e7 cmp %r12,%r15
7fffee0615b8: 40 0f 95 c6 setne %sil
7fffee0625b8: 75 7d jne 0x7fffee062637
7fffee0645b8: 48 c1 f8 02 sar $0x2,%rax
7fffee0675b8: 41 b9 01 00 00 00 mov $0x1,%r9d
7fffee06b5b8: 49 8d 5a 01 lea 0x1(%r10),%rbx
7fffee0705b8: 41 8b 04 84 mov (%r12,%rax,4),%eax
7fffee0755b8: 4a 8d 44 02 ff lea -0x1(%rdx,%r8,1),%rax
7fffee07a5b8: c3 ret
7fffee07b5b8: 0f 29 55 80 movaps %xmm2,-0x80(%rbp)
7fffee07c5b8: c3 ret
7fffee0855b8: 44 8b 8d 48 fb ff ff mov -0x4b8(%rbp),%r9d
7fffee08c5b8: 48 8b bd 88 f9 ff ff mov -0x678(%rbp),%rdi
7fffee0945b8: 0f 87 32 02 00 00 ja 0x7fffee0947f0
7fffee0965b8: 0f 84 77 08 00 00 je 0x7fffee096e35
7fffee0975b8: 45 31 c0 xor %r8d,%r8d
7fffee09f5b8: 48 89 d0 mov %rdx,%rax
7fffee0a25b8: c7 07 01 00 00 00 movl $0x1,(%rdi)
7fffee0a35b8: 48 8b 05 21 48 17 00 mov 0x174821(%rip),%rax # 0x7fffee217de0
7fffee0a85b8: 48 39 ca cmp %rcx,%rdx
7fffee0aa5b8: c9 leave
7fffee0ac5b8: 41 56 push %r14
7fffee0ae5b8: 31 d2 xor %edx,%edx
7fffee0b25b8: 81 e6 00 80 00 00 and $0x8000,%esi
7fffee0b45b8: 75 ae jne 0x7fffee0b4568
7fffee0bd5b8: 83 7f 08 00 cmpl $0x0,0x8(%rdi)
7fffee0c05b8: a8 02 test $0x2,%al
7fffee0c35b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7fffee0c95b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7fffee0cc5b8: 48 21 cf and %rcx,%rdi
7fffee0cf5b8: c3 ret
7fffee0d35b8: 89 c8 mov %ecx,%eax
7fffee0d45b8: 45 31 c0 xor %r8d,%r8d
7fffee0d75b8: 48 83 ef c0 sub $0xffffffffffffffc0,%rdi
7fffee0d95b8: 48 01 cf add %rcx,%rdi
7fffee0dc5b8: 66 0f 6f 0e movdqa (%rsi),%xmm1
7fffee0dd5b8: f7 c2 00 e0 00 00 test $0xe000,%edx
7fffee0e25b8: f7 c2 00 fe 00 00 test $0xfe00,%edx
7fffee0e35b8: c3 ret
7fffee0e45b8: 75 16 jne 0x7fffee0e45d0
7fffee0e55b8: 74 26 je 0x7fffee0e55e0
7fffee0eb5b8: 49 83 c1 18 add $0x18,%r9
7fffee0ee5b8: 48 89 c7 mov %rax,%rdi
7fffee0f85b8: 0f 84 03 e3 ff ff je 0x7fffee0f68c1
7fffee1015b8: 48 89 95 78 ff ff ff mov %rdx,-0x88(%rbp)
7fffee1035b8: 41 0f b6 41 01 movzbl 0x1(%r9),%eax
7fffee1075b8: 4c 89 db mov %r11,%rbx
7fffee1085b8: 4c 89 f7 mov %r14,%rdi
7fffee10c5b8: 5d pop %rbp
7fffee10f5b8: 48 63 f6 movslq %esi,%rsi
7fffee1125b8: 48 83 c2 02 add $0x2,%rdx
7fffee1135b8: 48 8d 7b 02 lea 0x2(%rbx),%rdi
7fffee1155b8: 48 8d 35 28 e4 0c 00 lea 0xce428(%rip),%rsi # 0x7fffee1e39e7
7fffee11c5b8: 89 c3 mov %eax,%ebx
7fffee1265b8: 48 83 c4 20 add $0x20,%rsp
7fffee12c5b8: 49 89 06 mov %rax,(%r14)
7fffee12f5b8: 85 c0 test %eax,%eax
7fffee1335b8: 48 83 ec 50 sub $0x50,%rsp
7fffee1355b8: 79 16 jns 0x7fffee1355d0
7fffee1365b8: 31 c0 xor %eax,%eax
7fffee13b5b8: 49 83 fc 02 cmp $0x2,%r12
7fffee13c5b8: 0f 84 32 e2 ff ff je 0x7fffee13a7f0
7fffee13d5b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7fffee1415b8: 31 ff xor %edi,%edi
7fffee1435b8: 48 63 ff movslq %edi,%rdi
7fffee1445b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7fffee1455b8: 89 f8 mov %edi,%eax
7fffee1475b8: 74 1f je 0x7fffee1475d9
7fffee1485b8: 41 89 c1 mov %eax,%r9d
7fffee1495b8: e8 33 ae ef ff call 0x7fffee0443f0
7fffee14a5b8: 72 15 jb 0x7fffee14a5cf
7fffee14c5b8: ba 05 00 00 00 mov $0x5,%edx
7fffee14e5b8: 48 8d 15 e1 8b f8 ff lea -0x7741f(%rip),%rdx # 0x7fffee0d71a0
7fffee15b5b8: 41 83 f8 04 cmp $0x4,%r8d
7fffee15c5b8: 0f 82 82 00 00 00 jb 0x7fffee15c640
7fffee15e5b8: 0f 84 df 00 00 00 je 0x7fffee15e69d
7fffee15f5b8: 48 8d bd 10 fe ff ff lea -0x1f0(%rbp),%rdi
7fffee1615b8: 31 f6 xor %esi,%esi
7fffee1745b8: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
7fffee1755b8: 4c 89 95 e8 f6 ff ff mov %r10,-0x918(%rbp)
7fffee1775b8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7fffee17f5b8: c3 ret
7fffee1845b8: b8 01 00 00 00 mov $0x1,%eax
7fffee1875b8: 45 85 e4 test %r12d,%r12d
7fffee18c5b8: 48 8b bb a0 00 00 00 mov 0xa0(%rbx),%rdi
7fffee18d5b8: 4a 8b 3c e0 mov (%rax,%r12,8),%rdi
7fffee1905b8: b8 01 00 00 00 mov $0x1,%eax
7fffee1955b8: bf 05 00 00 00 mov $0x5,%edi
7fffee1965b8: ba 20 00 00 00 mov $0x20,%edx
7fffee1975b8: 48 85 c0 test %rax,%rax
7fffee19f5b8: c4 41 7d fc c2 vpaddb %ymm10,%ymm0,%ymm8
7fffee1a05b8: 41 89 c1 mov %eax,%r9d
7fffee1a25b8: c5 fe 7f 4f 01 vmovdqu %ymm1,0x1(%rdi)
7fffee1a45b8: 8d b4 0f 81 00 00 00 lea 0x81(%rdi,%rcx,1),%esi
7fffee1a65b8: c4 c1 7d fc d0 vpaddb %ymm8,%ymm0,%ymm2
7fffee1ab5b8: 48 01 f8 add %rdi,%rax
7fffee1ad5b8: 01 00 00 00
7fffee1ae5b8: 74 cd je 0x7fffee1ae587
7fffee1b25b8: 48 8d 04 87 lea (%rdi,%rax,4),%rax
7fffee1b55b8: 00 00 00 00
7fffee1ba5b8: 83 f9 06 cmp $0x6,%ecx
7fffee1c15b8: 4d 89 cf mov %r9,%r15
7fffee1c35b8: 31 db xor %ebx,%ebx
7fffee22d5b8: ba 9c 01 00 00 mov $0x19c,%edx
7fffee22e5b8: 41 55 push %r13
7fffee2335b8: 31 ff xor %edi,%edi
7fffee2355b8: 49 8b 84 24 e8 02 00 mov 0x2e8(%r12),%rax
7fffee2365b8: 44 8b ad 2c fc ff ff mov -0x3d4(%rbp),%r13d
7fffee23a5b8: 00 00 00 00
7fffee23e5b8: 4c 8d 71 0b lea 0xb(%rcx),%r14
7fffee2415b8: 4c 8b ad 78 ff ff ff mov -0x88(%rbp),%r13
7fffee2485b8: 8b 1d 82 c5 01 00 mov 0x1c582(%rip),%ebx # 0x7fffee264b40
7fffee24f5b8: 00
7fffee2525b8: 45 31 d2 xor %r10d,%r10d
7fffee2545b8: 48 c7 c1 10 00 00 00 mov $0x10,%rcx
7fffee2555b8: 66 0f da 40 10 pminub 0x10(%rax),%xmm0
7fffee2565b8: 00 00 00 00
Our trap is thanks to a ud2 instruction, which is two bytes long.
We could repair the program counter thusly:
entrypoint = cpu.rip.get() + 2
cpu.rip.set(entrypoint)
Removing the trap¶
If we expect to encounter this code again, or if we had to replace an existing instruction in the program in order to add our trap, simply advancing the program counter won’t help. We will need to rewrite the instruction bytes to remove the trap entirely.
If we replaced an existing instruction, we will know the specific bytes we want to write back. Otherwise, we will need to use the disassembler as above to learn the number of bytes to rewrite.
In this example, we know the ud2 instruction is two bytes long,
and it did not replace an existing instruction,
so we want to replace it with a two-byte NOP.
In SmallWorld, Executables are just memory, so we can use the
bytes accessors from the Memory class to perform our modification quickly:
# 2-byte x86 NOP.
nop = b'\x66\x90'
code.write_bytes(cpu.rip.get(), nop)
Note
Some ISAs present instructions in native byte order, so an instruction on a little-endian system will appear backwards in memory. Be careful of this when manually rewriting code.
Putting it all together¶
Combined, this can be found in the script actuate/elf_core.amd64.py:
import logging
import pathlib
import smallworld
# Set up logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)
# Define the platform
platform = smallworld.platforms.Platform(
smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)
# Create a machine
machine = smallworld.state.Machine()
# Create a CPU
cpu = smallworld.state.cpus.CPU.for_platform(platform)
machine.add(cpu)
# Load and add core file into the state
filepath = pathlib.Path(__file__).resolve()
filename = (
filepath.name.replace(".py", ".elf.core")
.replace(".angr", "")
.replace(".panda", "")
.replace(".pcode", "")
)
filename = (filepath.parent.parent / filename).as_posix()
with open(filename, "rb") as f:
code = smallworld.state.memory.code.Executable.from_elf_core(f, platform=platform)
machine.add(code)
code.populate_cpu(cpu)
# Load the original binary so we can copy .text
# I can't get my system to dump the executable segments.
origname = filename.replace(".core", "")
with open(origname, "rb") as f:
orig = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform, address=code.address
)
# The core file reserves space before the true load address for its metadata.
code_offset = (cpu.pc.get() - code.address) & 0xFFFFFFFFFFFFF000
code[code_offset] = orig[0x0]
# Replace the instruction bytes at rip with a nop
nop = b"\x66\x90"
code.write_bytes(cpu.rip.get(), nop)
# Set up a puts handler
# puts address recovered from manual RE
puts_addr = (cpu.rip.get() & 0xFFFFFFFFFFFFF000) | 0x610
puts = smallworld.state.models.Model.lookup(
"puts", platform, smallworld.platforms.ABI.SYSTEMV, puts_addr
)
machine.add(puts)
# Add an exit point
machine.add_exit_point(cpu.rip.get() + 0x23)
# Emulate
emulator = smallworld.emulators.UnicornEmulator(platform)
machine.emulate(emulator)
Aside from loading the core dump and patching bytes,
the only other step is to hook puts;
we may have the process memory, but we don’t have a system call model.
Here is what running the harness looks like:
$ python3 elf_core.amd64.py
The current binary doesn't have a section header
[+] starting emulation at 0x7fffeea695bf
[+] emulation complete
foobar
Handling missing segments¶
As mentioned above, Linux won’t dump file-backed read-only segments; it assumes you already have the data.
(There is theoretically way to enable these segments in dumps, but it may be blocked by other security policies.)
If this happens, we can replace the data by loading the executable segment out of our original binary:
origname = filename.replace(".core", "")
with open(origname, "rb") as f:
orig = smallworld.state.memory.code.Executable.from_elf(f, platform=platform, address=code.address)
# The core file reserves space before the true load address for its metadata.
#
# NOTE: You may need to adjust the masking depending on the segment's offset.
code_offset = (cpu.pc.get() - code.address) & 0xfffffffffffff000
code[code_offset] = orig[0x0]
When tested, this wasn’t the case for amd64 binaries,
but it was the case for all other architectures.
See any other example elf_core_actuate.$ARCH.py for examples of this working.
Known Limitations¶
On some platforms, core dumps exercise features that not all emulators support.
amd64 core dumps include the
rflagsregister, which angr doesn’t represent directlyi386 core dumps include the
eflagsregister, which angr doesn’t represent directlyi386 core dumps include segment registers, which require complex setup for Panda and Unicorn
Additionally, extracting the full machine state from Ghidra or angr can take a while.
If you don’t need the final machine state,
use Machine.apply() paired with emulator.run().