Loading an ELF¶
In this tutorial, you will be guided through loading an ELF object file.
Executable and Linkable Format (ELF) is the file format Linux and most other Unix-descended operating systems use to store native code. It contains the code and data for one executable or library, as well as metadata detailing what the program loaders need to do to set up that code and data within a process.
Consider the example tests/elf/elf.amd64.elf.s:
.text
.globl _start
.type _start, @function
_start:
# Load argc
mov 0x8(%rsp),%rdi
# If argc != 2, leave.
cmp $2,%rdi
jne .L2
# Load argv
mov 0x10(%rsp),%rdi
# Load argv[1]
mov 0x8(%rdi),%rdi
mov $0,%rax
.L3:
# for(i = 0; argv[1][i] != '\0'; i++);
cmpb $0,(%rdi,%rax)
je .L1
add $1,%rax
jmp .L3
.L2:
# Failure; return -1
mov $-1,%rax
.L1:
# Leave, by any means necessary
ret
.size _start, .-_start
This is a very simple, not totally correct program that will perform
the equivalent of strlen on argv[1].
You can build it into elf.amd64.elf by running the following:
cd smallworld/tests
make elf/elf.amd64.elf
Warning
Unlike previous tests, this requires as to assemble.
This will only work correctly on an amd64 host;
on another platform, as will target the wrong architecture.
In order to harness code contained in an ELF, we need to do at least the following:
Follow the metadata in the ELF to unpack the memory image inside
Set execution to start at the correct place
Using the ELF Loader¶
SmallWorld includes a model of the basic features of the Linux kernel’s ELF loader.
To exercise it, you need to use Executable.from_elf(), described in
Memory Objects.
ELFs can contain code that’s intended to be loaded at a specific position, or that can be loaded at any address (position-independent). If our example is position independent, we will need to specify a load address.
Let’s take a look at our example, using the command readelf -l elf.amd64.elf
$ readelf -l elf.amd64.elf
Elf file type is EXEC (Executable file)
Entry point 0x1001120
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000001000040 0x0000000001000040
0x00000000000000e0 0x00000000000000e0 R 0x8
LOAD 0x0000000000000000 0x0000000001000000 0x0000000001000000
0x0000000000000120 0x0000000000000120 R 0x1000
LOAD 0x0000000000000120 0x0000000001001120 0x0000000001001120
0x000000000000002f 0x000000000000002f R E 0x1000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000001000000 RW 0x0
Section to Segment mapping:
Segment Sections...
00
01
02 .text
03
This lists the different “program headers”, or instructions to the OS
describing how to load the program. The LOAD headers
define blocks of memory allocated in the process.
The fact that the segment for offset zero is at address 0x400000
tells us that this is a fixed-position ELF.
We do not need to provide a load address when calling Executable.from_elf(),
and we need to avoid memory around address 0x400000, or risk clobbering our ELF.
binpath = "elf.amd64.elf"
with open(binpath, "rb") as f:
# from_elf() needs to take the file handle as an argument.
#
# The "platform" argument is optional;
# if absent, the elf loader will infer the platform from the ELF header.
# if present, the elf loader will error if the ELF is for a different platform.
code = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform
)
machine.add(code)
Finding the entrypoint¶
We need to find the start of our code within the ELF.
In our assembly, the code is contained in the _start function symbol.
This is a special symbol for Linux programs; it defines the program entrypoint,
which is exposed in the ELF metadata, and can be accessed via ElfExecutable.entrypoint.
entrypoint = code.entrypoint
cpu.rip.set(entrypoint)
Adding Bounds¶
We can use an ELF’s metadata to identify executable regions of memory, and put them “in-bounds” for emulation.
This does not happen automatically, since a harness may want to restrict execution to a narrower subset of memory than “everything executable in the ELF.”
Here, we are fine defining all code in the ELF as in-bounds:
for bound in code.bounds:
machine.add_bound(bound[0], bound[1])
Putting it all together¶
Combined, this can be found in the script tests/elf/elf.amd64.py:
import logging
import sys
import smallworld
# Set up logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)
# Define the platform
platform = smallworld.platforms.Platform(
smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)
# Create a machine
machine = smallworld.state.Machine()
# Create a CPU
cpu = smallworld.state.cpus.CPU.for_platform(platform)
machine.add(cpu)
# Load and add code into the state
filename = (
__file__.replace(".py", ".elf")
.replace(".angr", "")
.replace(".panda", "")
.replace(".pcode", "")
)
with open(filename, "rb") as f:
code = smallworld.state.memory.code.Executable.from_elf(f, platform=platform)
machine.add(code)
# Set entrypoint from the ELF
if code.entrypoint is None:
raise ValueError("ELF has no entrypoint")
cpu.rip.set(code.entrypoint)
# Create a stack and add it to the state
stack = smallworld.state.memory.stack.Stack.for_platform(platform, 0x2000, 0x4000)
machine.add(stack)
# Push a string onto the stack
string = sys.argv[1].encode("utf-8")
string += b"\0"
string += b"\0" * (16 - (len(string) % 16))
stack.push_bytes(string, None)
str_addr = stack.get_pointer()
# Push argv
stack.push_integer(0, 8, None) # NULL terminator
stack.push_integer(str_addr, 8, None) # pointer to string
stack.push_integer(0x10101010, 8, None) # Bogus pointer to argv[0]
# Push address of argv
argv = stack.get_pointer()
stack.push_integer(argv, 8, None)
# Push argc
stack.push_integer(2, 8, None)
# Push fake return value
# This should be an exit point
exitpoint = code.entrypoint + code.get_symbol_size("_start") - 4
machine.add_exit_point(exitpoint)
stack.push_integer(exitpoint, 8, None)
# Configure the stack pointer
sp = stack.get_pointer()
cpu.rsp.set(sp)
# Emulate
emulator = smallworld.emulators.UnicornEmulator(platform)
# Use code bounds from the ELF
emulator.add_exit_point(0)
for bound in code.bounds:
machine.add_bound(bound[0], bound[1])
machine.emulate(emulator)
Here, we load the code from our ELF and set the program counter to the entrypoint.
We also configure a stack with the expected argc/argv layout,
and set rdi and rsi equal to argc and argv respectively.
We halt execution before the final return (which won’t work),
and read out the result from rax.
Here is what running the harness looks like:
$ python3 elf.amd64.py foobar
[+] starting emulation at 0x1001120
[+] emulation complete
Since “foobar” is length six, we have harnessed elf.amd64.elf completely.