Linking an ELF¶
In this tutorial, you will be guided through loading and linking multiple ELF object files.
Note
This tutorial builds on Loading an ELF; it’s highly recommended you read it first.
On desktop or server Linux systems, the majority of ELF executables will be dynamically-linked. This means some of their code is referenced from other ELF files, known as “shared objects”.
The kernel ELF loader isn’t sufficient to load a dynamically-linked program; it needs a utility called the Runtime Link Editor (RTLD) to load the required shared objects and resolve any cross-referenced between them.
SmallWorld doesn’t require that you fully link a dynamic executable, but you may want to harness code that calls a function or references data from a shared obejct.
Consider the example tests/link_elf/link_elf.elf.c and tests/link_elf/link_elf.so.c
#include <stdlib.h>
int main(int argc, char *argv[]) {
if (argc < 2) {
return -1;
}
return atoi(argv[1]);
}
#include <stddef.h>
long strtol(const char *arg, char **endptr, int base) {
int sign = 1;
long res = 0;
// NOTE: This is not a complete implementation.
// The real one will handle endptr and base.
if(arg == NULL) {
return 0;
}
if(*arg == '-') {
sign = -1l;
arg++;
}
while(*arg >= '0' && *arg <= '9') {
res *= 10;
res += (long)((*arg) - '0');
arg++;
}
return res * sign;
}
int atoi(const char *arg) {
return (int)strtol(arg, NULL, 10);
}
This is an artifical example, with link_elf.so.c providing atoi,
which is usually provided by glibc.
The glibc version of atoi is significantly more complicated,
and is actually pretty difficult to micro-execute.
(Note we actually need to provide atoi and strtol;
in later versions of glibc, atoi is just an alias for strtol.
This will come back to bite us later.)
You can build these into link_elf.amd64.elf and link_elf.amd64.so
using the following commands:
cd smallworld/tests
make link_elf/link_elf.amd64.elf link_elf/link_elf.amd64.so
Warning
Unlike previous tests, this requires gcc to compile and assemble.
This will only work correctly on an amd64 host;
on another platform, gcc will target the wrong architecture.
Specifying dynamic dependencies¶
The first part of the dynamic linking process is identifying necessary shared objects.
SmallWorld leaves this process up to the harness author, allowing them to provide specific versions of any required shared objects, or to leave out shared objects that won’t be relevant to the harness.
We can get a list of the shared objects required by an executable
or shared object using readelf -d. Let’s try this with link_elf.amd64.elf:
$ readelf -d link_elf.amd64.elf
Dynamic section at offset 0x520 contains 21 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000001e (FLAGS) BIND_NOW
0x000000006ffffffb (FLAGS_1) Flags: NOW PIE
0x0000000000000015 (DEBUG) 0x0
0x0000000000000007 (RELA) 0x3e0
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x0000000000000017 (JMPREL) 0x3f8
0x0000000000000002 (PLTRELSZ) 24 (bytes)
0x0000000000000003 (PLTGOT) 0x2678
0x0000000000000014 (PLTREL) RELA
0x0000000000000006 (SYMTAB) 0x2e8
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000005 (STRTAB) 0x3a4
0x000000000000000a (STRSZ) 57 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x368
0x0000000000000004 (HASH) 0x384
0x000000006ffffff0 (VERSYM) 0x330
0x000000006ffffffe (VERNEED) 0x338
0x000000006fffffff (VERNEEDNUM) 1
0x0000000000000000 (NULL) 0x0
The NEEDED tag specifies the names of the required shared objects.
In this case, we only need libc.so.6, which we will substitute
with link_elf.amd64.so for ease of harnessing.
Loading multiple ELFs¶
Loading multiple ELF files is as simple as calling the elf loader multiple times,
and adding multiple objects to the Machine.
Shared objects are almost always position-independent, so the harness will need to specify a load address.
filename = "link_elf.amd64.elf"
with open(filename, "rb") as f:
code = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform, address=0x400000
)
machine.add(code)
libname = "link_elf.amd64.elf"
with open(libname, "rb") as f:
lib = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform, address=0x800000
)
machine.add(lib)
Linking ELFs¶
The final step is to resolve the cross-references between our binaries, a process called “linking”.
The ELF linker is an extremely simple algorithm with a lot of extremely complicated details.
Each dynamically-linkable ELF defines a set of symbols, essentially named variables. Some of these are left undefined, representing references to code or data in another ELF file.
Each ELF also defines a list (two lists, actually) of relocations. These are internal cross-references describing how assigning a value to an undefined symbol should change data in the loaded image:
How many bits should the relocated value overwrite?
Should the symbol value itself be written, or do we want to write the result of a computation?
Will this value need space allocated in a dynamic data structure?
On Linux, the rtld runs before your application code, finds matching defined and undefined symbols, and performs the necessary fixups defined in the relocations.
SmallWorld offers a few interfaces for simulating this process.
A harness can link a specific symbol by updating its value; the ELF model will perform all necessary relocations opaquely:
atoi = lib.get_symbol_value("atoi")
code.update_symbol_value("atoi", atoi)
This method is also useful if a harness needs to provide its own address, say for a function hook (for more information, see Modeling and Hooking):
atoi = smallworld.state.models.Model.lookup(
"atoi", platform, smallworld.platforms.ABI.SYSTEMV, 0x10000
)
machine.add(atoi)
code.update_symbol_value("atoi", atoi.address)
Beware that the link-level interface
of some binaries may not match their public API.
Case in point: in the version of glibc this example was written against,
atoi was a true function. In a recent update,
it’s become a macro that actually calls strtol.
The code above would not be resilient to this change.
SmallWorld also offers a method ElfExecutable.link_elf()
to perform this operation for every undefined symbol
in a destination ELF which has a corresponding defined symbol
in a source elf:
# Link code and lib against themselves.
#
# By default, the Linux compiler will always call a dynamic function
# indirectly using the platform's dynamic call mechanism,
# even when calling a function in the same ELF.
code.link_elf(code, all_syms=True)
lib.link_elf(lib, all_syms=True)
# Link code against lib to resolve actual dependencies.
code.link_elf(lib)
Note that this requires a bit of care from the harness when loading multiple ELFs. If a symbol is defined in more than one shared obejct, it will take the first version it finds.
Finding main()¶
A harness can take advantage of the symbol information in an ELF to start execution at places other than the file’s entrypoint.
Even if you want to run the program from the beginning, the entrypoint of a Linux application ELF points not to application code, but to the C runtime initializer. This is a) insanely complicated to harness, and b) completely uninteresting for most analyses.
In this example, we want to start executing from main().
We can look up the address of main() like so:
entrypoint = code.get_symbol_value("main")
cpu.rip.set(entrypoint)
Caution
Many ELF files are “stripped”; they have all metadata not absolutely essential for the kernel program loader and rtld removed in order to save space, or to bamboozle analysts.
The symbols defining internal functions like main()
fall into the category of “non-essential”,
so this technique won’t always work.
Putting it all together¶
Using what we’ve learned about the ELF loader and linker model,
we can build link_elf.amd64.py:
import logging
import sys
import smallworld
# Set up logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)
# Define the platform
platform = smallworld.platforms.Platform(
smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)
# Create a machine
machine = smallworld.state.Machine()
# Create a CPU
cpu = smallworld.state.cpus.CPU.for_platform(platform)
machine.add(cpu)
# Load and add code into the state
filename = (
__file__.replace(".py", ".elf")
.replace(".angr", "")
.replace(".panda", "")
.replace(".pcode", "")
)
with open(filename, "rb") as f:
code = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform, address=0x400000
)
machine.add(code)
libname = (
__file__.replace(".py", ".so")
.replace(".angr", "")
.replace(".panda", "")
.replace(".pcode", "")
)
with open(libname, "rb") as f:
lib = smallworld.state.memory.code.Executable.from_elf(
f, platform=platform, address=0x800000
)
machine.add(lib)
lib.link_elf(lib)
code.link_elf(code)
lib.link_elf(lib, all_syms=True)
code.link_elf(code, all_syms=True)
code.link_elf(lib)
# Load and add code from lib.
# Pray to pudding that
# Set entrypoint from the ELF
entrypoint = code.get_symbol_value("main")
cpu.rip.set(entrypoint)
# Create a stack and add it to the state
stack = smallworld.state.memory.stack.Stack.for_platform(platform, 0x2000, 0x4000)
machine.add(stack)
# Push a string onto the stack
string = sys.argv[1].encode("utf-8")
string += b"\0"
string += b"\0" * (16 - (len(string) % 16))
stack.push_bytes(string, None)
str_addr = stack.get_pointer()
# Push argv
stack.push_integer(0, 8, None) # NULL terminator
stack.push_integer(str_addr, 8, None) # pointer to string
stack.push_integer(0x10101010, 8, None) # Bogus pointer to argv[0]
# Push address of argv
argv = stack.get_pointer()
stack.push_integer(argv, 8, None)
# Push argc
stack.push_integer(2, 8, None)
# Push fake return
# Make it an exit point
exitpoint = entrypoint + code.get_symbol_size("main")
stack.push_integer(exitpoint, 8, None)
machine.add_exit_point(exitpoint)
# Configure the stack pointer
sp = stack.get_pointer()
cpu.rsp.set(sp)
# Set argument registers
cpu.rdi.set(2)
cpu.rsi.set(argv)
# Emulate
emulator = smallworld.emulators.UnicornEmulator(platform)
# Use code bounds from the ELF
emulator.add_exit_point(0)
for bound in code.bounds:
machine.add_bound(bound[0], bound[1])
for bound in lib.bounds:
machine.add_bound(bound[0], bound[1])
final_machine = machine.emulate(emulator)
final_cpu = final_machine.get_cpu()
print(final_cpu.rax)
This includes code for linking an ELF,
as well as setting up the stack and registers
to provide argc and argv to main().
Here is what running this harness looks like:
$ python3 link_elf.amd64.py 42
[+] starting emulation at 0x4014c0
[+] emulation complete
Reg(rax,8)=0x2a
Since 0x2a is the integer version of 42,
we have successfully harnessed link_elf.amd64.elf
and link_elf.amd64.so