Harnessing a Simple Program¶

In this tutorial you will be guided through the steps to harness a very simple snippet of binary code. Here it is, weighing in at only two lines of x86 assembly.

BITS 64;
; This is a function that just squares a number. 
; It takes a 32-bit argument in edi and returns the 32-bit product in eax.
; If we can't run this, we can't run anything.
        imul    edi, edi
        mov     eax, edi

The source for this example can be found in the tests directory of the repository, in the file square.amd64.s. There are a number of other small assembly examples there along with it. There is also a Makefile in that directory and you will have to run make there in order to generate the binary code used in this and other tutorials involving those tests. Once you have run make, the corresponding binary, which we will harness in this tutorial, will be in the file square.amd64.bin.

A reasonable first step in harnessing is to run SmallWorld’s basic harness script which assumes nothing about the code, and tries to run it a bunch of times to see what it can deduce. That script, basic_harness.py, lives in the examples directory. It is fairly simple, so let’s have a look at it.

import logging
import sys
import typing

import smallworld
from smallworld import hinting
from smallworld.analyses import Colorizer, ColorizerReadWrite, ColorizerSummary
from smallworld.analyses.colorizer import randomize_uninitialized

# setup logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)
logger = logging.getLogger(__name__)

# configure the platform for emulation
platform = smallworld.platforms.Platform(
    smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)

# create a machine
machine = smallworld.state.Machine()

# create a CPU and add it to the machine
cpu = smallworld.state.cpus.CPU.for_platform(platform)

# create an executable and add it to the machine
code = smallworld.state.memory.code.Executable.from_filepath(
    sys.argv[1], address=0x1000
)

# set start instruction for analysis and an "exit point" at which to
# stop
cpu.pc.set(code.address)
machine.add_exit_point(code.address + code.get_capacity())

# add code and cpu to the machine
machine.add(code)
machine.add(cpu)

hinter = hinting.Hinter()

analyses: typing.List[smallworld.analyses.Analysis] = [
    ColorizerSummary(hinter),
    ColorizerReadWrite(hinter),
]

seed = 123456
cs = ColorizerSummary(hinter)
for i in range(10):
    c = Colorizer(hinter, num_insns=10, exec_id=i)
    perturbed_machine = randomize_uninitialized(machine, seed + i, [])
    c.run(perturbed_machine)
# Technically, an analysis takes a `machine` arg but this one doesn't
# actually use it for anythihng.  This is because it just listens for
# colorizer hints.
cs.run(None)

After the imports, the script sets up logging. Logging is probably self-explanatory (change level to logging.DEBUG to get lots of low-level output).

Next, the script asserts an appropriate platform. This is a SmallWorld concept which, essentially, establishes various aspects of the CPU that will be used to execute instructions. The platform is used, in this script, to actually create the virutal CPU SmallWorld will use with this code: cpu = smallworld.state.cpus.CPU.for_platform(platform) Note: platform is also used by other parts of SmallWorld, such as its analyses, to know endianness, register widths and names, etc.

The cpu state is yet another SmallWorld concept (it is explained in Machine State). You can think of it as a place to set up registers and memory (which comes with convenient stack and heap abstractions) with specific initial values in a way that is agnostic to details about any particular dynamic analysis employing a particular emulator or engine. Thus, the same state could be applied to a Unicorn emulator or a angr engine or a PANDA engine or …

The script next loads the code from the file square.amd64.bin and sets its base address to 0x1000, with the lines

code = smallworld.state.memory.code.Executable.from_filepath(
    sys.argv[1], address=0x1000
)

Then, the script sets the instruction address at which to start dynamic analysis with cpu.pc.set(code.address) and also defines an exit point for an emulator to know when to stop with machine.add_exit_point(code.address + code.get_capacity()).

Once created, code and cpu are added to the SmallWorld machine. This is another SmallWorld construct; the machine represents all initial analysis state including register contents specific to the included cpu and any memory contents.

Next, hinting is set up. Hinting is also a SmallWorld concept. Hints and analyses are described in detail in Analyses but the basic idea is that SmallWorld includes various analyses that are intended to provide hints that can guide the creation of a code harness. Hints can be read by a human or consumed by other, higher-level analyses. Here, we create a hinter and arrange for the method collect_hints to be called when either a DynamicRegisterValueHint or a DynamicRegisterValueSummaryHint is generated by some analysis. That method just prints out the hint.

The basic_harness.py script employs an analysis called the colorizer which is a kind of poor person’s dynamic taint analysis. Registers are initialized with large random values and data flows are inferred when those values are observed being used by subsequent instructions. This can tell us lots of things, but, here, it will tell us simply what register(s) are inputs for this code.

The colorizer runs code with random intial state. This means the code can run differently (follow a different path) each time. Each run is a micro execution (inspired by the paper “Micro execution” [1]). This script performs ten micro executions, randomizing uninitialzed registers before each with the code

seed = 123456
cs = ColorizerSummary(hinter)
for i in range(10):
    c = Colorizer(hinter, num_insns=10, exec_id=i)
    perturbed_machine = randomize_uninitialized(machine, seed + i, [])
    c.run(perturbed_machine)
cs.run(None)

The ColorizerSummary analysis summarizes hints across micro executions. It “runs” after all the colorizer micro executions and catches all hints output by other analyses connected to the same hinter.

Note that all of the setup performed is entirely generic; this script assumes nothing about the binary to be harnessed. To run the script we just provide a single argument, the binary, to get the commandline python3 basic_harnesss.py square.amd64.bin.

Here is what that outputs

$ python3 examples/basic_harness.py tests/square/square.amd64.bin
[+] seed=123456 digest of changes made to machine: 9fb43f6a9e58f4de95300b1d9bd9192a
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123457 digest of changes made to machine: 75958496f96ec41a114200a663eb7ea5
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123458 digest of changes made to machine: 9114b586cab688e3e2eaac7764d5ff3d
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123459 digest of changes made to machine: d8f683125ac66437da87e2a4147d92ef
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123460 digest of changes made to machine: d3ec6c1e68474fcf8988c62594e636a8
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123461 digest of changes made to machine: 9c8309440489f444ed253841cf9505d1
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123462 digest of changes made to machine: 8626e15c4f93a52b02b6fc07588ca5a9
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123463 digest of changes made to machine: 7378c46011e4c55bae506b6a74f819b5
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123464 digest of changes made to machine: 5accb97bcc9569d3998eedf946120964
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778
[+] seed=123465 digest of changes made to machine: af522c74f58a93272e6d4dcd4de77271
[+] captured trace of 2 instructions, res=TraceRes.ER_BOUNDS trace_digest=8d4c60d453cccdb02972044161367778

The output contains a lot of hints but we only need to look at the Summary ones, which are hints that were emitted by multiple micro executions. This is the final three lines in that output. A read-def-summary hint corresponds to a “color” or random value in a register that is used by an instruction that was not seen before. This means it is an input to this block of code. There is only one such hint output by basic_harness.py

[+] DynamicRegisterValueSummaryHint(message='read-def-summary', pc=4096, color=1, size=4, use=True, new=True, count=10, num_micro_executions=10, reg_name='edi')

The out tells us that the register edi is an input to this snippet of code and should really be set explicitly. We can now create a new script square.amd64.py which harnesses square.amd64.bin perfectly, exposing edi as a command-line argument.

import logging
import sys

import smallworld

# Set up logging and hinting
smallworld.logging.setup_logging(level=logging.INFO)

# Define the platform
platform = smallworld.platforms.Platform(
    smallworld.platforms.Architecture.X86_64, smallworld.platforms.Byteorder.LITTLE
)

# Create a machine
machine = smallworld.state.Machine()

# Create a CPU
cpu = smallworld.state.cpus.CPU.for_platform(platform)
machine.add(cpu)

# Load and add code into the state
code = smallworld.state.memory.code.Executable.from_filepath(
    __file__.replace(".py", ".bin")
    .replace(".angr", "")
    .replace(".panda", "")
    .replace(".pcode", ""),
    address=0x1000,
)
machine.add(code)

# Set the instruction pointer to the code entrypoint
cpu.rip.set(code.address)

# Initialize argument registers
cpu.rdi.set(int(sys.argv[1]))

# Emulate
emulator = smallworld.emulators.UnicornEmulator(platform)
emulator.add_exit_point(cpu.rip.get() + code.get_capacity())
final_machine = machine.emulate(emulator)

# read out the final state
cpu = final_machine.get_cpu()
print(hex(cpu.eax.get()))

And here is what it looks like to run that script, setting edi to 42 initially.

$ python3 ../../tests/square/square.amd64.py 42
[+] starting emulation at 0x1000
[+] emulation complete
0x6e4

Since 42*42=1764 which is 0x6e4 we have harnessed square.amd64.bin completely.

Footnotes

Harnessing a Simple Program¶

SmallWorld

Navigation

Related Topics