Common Emulator Interface¶
Note
This interface very closely mirrors the Machine State interface, to the point it may not be obvious why SmallWorld has a separate machine state representation at all.
There are two benefits of having this separation:
Analyses may perform many emulation runs with slight modifications between each. It’s much easier to store a single
Machineobject than to have to capture and re-run code to populate anEmulatorfor every run.There are a number of little gotchas and backend-specific pieces of boilerplate necessary when populating or accessing an
Emulatordirectly. The state interface captures many of them.
Available Emulators¶
All emulators are subclasses of Emulator:
These all take a single Platform object as an argument to their constructor.
While this interface works hard to unify behavior across emulators, these are all very different tools with different sets of supported features and “gotchas”. Please see the docs for the specific backends for details.
Note
Not all backends support all platforms, or all features of all platforms.
See Supported Platforms for exact details of which backends support which platforms.
Execution Control¶
Emulators support three modes of execution:
Emulator.run() emulates code continuously until one of the following happens:
Execution encounters a code boundary (exit point or bounds).
An event handler raises
EmulationStop, or one of its subclasses.
Any other exceptions will get passed up to the caller.
Emulator.step() emulates a single instruction at the current program counter.
All exceptions are raised to the caller.
This includes exceptions indicating a clean exit,
namely EmulationStop and its subclasses.
Emulator.step_block() emulates a single basic block at the current program counter.
All exceptions are raised to the caller.
This includes exceptions indicating a clean exit,
namely EmulationStop and its subclasses.
Note
The different backends have slightly different
basic block identification algorithms;
Emulator.step_block() is not guaranteed to
advance through the same code in the same way for all backends.
Exit Points and Bounds¶
Emulators allow a harness to support two mechanisms for bounding execution: exit points and bounds.
An exit point is an address which, if executed, will cause execution to stop. Note that execution stops before the instruction at the exit point is executed. There is currently no concept of an “exit-after point” in SmallWorld.
Exit points can be on any address, including one outside valid memory. This is useful for exiting on return from the top-level function, or detecting a call to a non-existent library.
Caution
The Panda Backend backend does not support exit points on unmapped memory. This will likely be fixed with the next major update.
Harnesses or analyses may manually add exit points to the emulator
using Emulator.add_exit_point(). They may inspect the currently-registered
exit points using Emulator.get_exit_points().
Bounds are ranges of addresses that the emulator is allowed to execute. If the current program counter is not in any of the ranges specified, emulation will stop.
If no bounds are specified, all mapped memory is considered “in bounds”. Note that, for emulators that treat unmapped memory accesses as errors, executing an unmapped instruction will be reported as an unmapped memory access, not as out-of-bounds execution.
Harnesses or analyses may manually add a memory range to the execution bounds
using Emulator.add_bound(). They may inspect the current bounds using
Emulator.get_bounds(), or remove a region from the execution bounds
using Emulator.remove_bound() (the range does not need to match a specifc added bound.)
Accessing Registers¶
Harnesses and analyses may access an emulator’s register state using
Emulator.read_register() and Emulator.write_register()
Register state is represented as an integer in host byte order,
and is converted to and from guest byte order automatically.
Note
Registers with more complicated data types (floating-point or structured data) are not represented specially. It’s up to the harness or analysis to convert the integral value into a more appropriate format.
There are a very small number of exceptions; see the docs for specific backends for details.
Note
Symbolic values produced by angr can’t be represented as integers, and need special handling.
This special handling is performed automatically
by the Machine State interface. It’s only a concern
when interacting directly with an Emulator.
See the docs for the angr backend for details.
Emulators may also support a concept of labels (see Values in Smallworld for details),
which may be accessed via Emulator.read_register_label()
and Emulator.write_register_label().
See the docs for specific backends for details on label support.
Note
There may be more than one popular naming convention or style for the registers of a specific platform.
Consult the PlatformDefinition class for a specific platform
(see Describing a Platform) for the convention SmallWorld uses.
Mapping Memory¶
SmallWorld emulators maintain a crude model of a memory map.
This allows emulators to detect out-of-bounds memory accesses; exactly how this is handled is back-end specific. See the docs for the specific back-ends.
Note
SmallWorld does not currently support a notion of permissions on mapped memory regions. All mapped regions are assumed to allow read, write, and execute.
This is handled opaquely by the Machine State interface; any part of the machine state that needs mapped memory will request it when the machine state is applied to the emulator.
Harnesses and analyses may manually add additional memory to the memory map using
Emulator.map_memory(), which takes an address and a size.
Addresses and sizes do not need to be page-aligned.
Harnesses and analyses may fetch the current memory map
from an emulator using Emulator.get_memory_map().
This will return a list of Tuple[int, int]
of the form (start, end).
Caution
Modifying the memory map of a running emulator is untested, and may produce undefined behavior.
Note
The resolution of the memory map differs between backends. If a backend maps memory in pages, it will automatically page-align requested regions.
See the docs for the specific backends for details.
Note
SmallWorld emulators don’t readily support separate interactions with physical and virtual memory:
Most backends don’t support the necessary privileged features to manage virtual-to-physical memory mapping.
Managing virtual memory would usually require adding a large chunk of a live OS to your harness, defeating the purpose of micro execution.
By default, SmallWorld emulators will support as close to a full virtual address space as possible, although some emulators may present slightly different memory layouts for some platforms. See the docs for the specific backends for details.
Accessing Memory¶
Harnesses and analyses may access an emulator’s memory state using
Emulator.read_memory() and Emulator.write_memory().
Memory state is represented as bytes objects.
Note
Symbolic values produced by angr can’t be represented as bytes, and need special handling.
This special handling is performed automatically
by the Machine State interface; it’s only a concern
when interacting directly with an Emulator.
See the docs for Angr Backend for details.
This interface will obey the current memory map, as supported by the backend.
Accessing memory not mapped via Emulator.map_memory() may raise an exception.
This interface will obey platform-specific configuration, as supported by the backend. All backends default to a “safe” state; a harness does not need to provide any platform-specific configuration to perform basic memory accesses. If a harness does provide such configuration, memory acceses may raise an exception if they violate the configured parameters, or if the configuration is invalid.
There is a separate function for writing
executable instruction information, Emulator.write_code().
For all currently-supported platforms,
this has the exact same effect on machine state as Emulator.write_memory().
However, some emulators, namely angr, load code differently from data. See the docs for specific backends.
Event Handlers¶
SmallWorld emulators accept handlers for a variety of events:
Instruction execution
Function calls
Memory accesses
System calls
Interrupts
Not every backend supports all event types. see the docs for specific backends for details.
All callback functions receive an Emulator
as their first parameter. The callback
can use that Emulator to modify machine state,
including the current program counter.
Attempting to modify the emulator’s memory map, or start execution will result in undefined behavior.
A callback can raise an EmulationStop exception
to halt emulation gracefully.
Instruction Execution¶
Harnesses or analyses can register callbacks to trigger
before a specific instruction gets executed using
InstructionHookable.hook_instruction().
This takes the address of the instruction,
and a callback of the form callback(Emulator) -> None.
Aside from any explicit modifications made by the callback, an instruction event handler will not modify emulator state, or the bound instruction’s execution.
The callback will only trigger if the program counter specifically equals the bound address. (This is rarely a problem, but can happen with certain ISAs that allow optional prefixes.)
Only one callback can be registered for a specific program counter. Attempting to register more than one callback will raise an exception.
A callback can be removed from an instruction
using InstructionHookable.unhook_instruction().
Harnesses or analyses can also use
InstructionHookable.hook_instructions() to register a callback
that will trigger on every instruction.
Only one such callback can be applied to an Emulator.
Attempting to apply a second callback will raise an exception.
A global instruction callback can be removed
using InstructionHookable.unhook_instructions().
Function Calls¶
Harnesses or analyses can register callbacks
that replace an instruction with a function model
using FunctionHookable.hook_function().
This takes the address of the instruction,
and a callback of the form callback(Emulator) -> None.
The emulator will not execute the replaced instruction; instead, it will execute the callback, and then mimic a platform-appropriate “return” operation, as if the instruction were the start of a larger function that then returned.
Caution
Even though the replaced instruction gets skipped, it must still be in mapped memory.
Only one function model can be registered for a specific instruction. Attempting to register more than one model will raise an exeception.
An instruction model can be removed using FunctionHookable.unhook_function().
Memory Reads¶
Harnesses or analyses can use MemoryReadHookable.hook_memory_read()
to register a callback that triggers when a specific address range is read.
This takes the starting address of the memory region,
the size in bytes of the memory region,
and a callback of the form callback(Emulator, int, int, bytes) -> Optional[bytes].
The callback receives the start address of the read, the size of the read,
and the data that was read as a bytes object.
The callback can return None to allow the read to proceed normally,
or return a bytes object of the same size as the original read
to override the result of the read operation.
A read callback can be removed from the emulator using
MemoryReadHookable.unhook_memory_read().
Harnesses or analyses can use MemoryReadHookable.hook_memory_reads()
to register a callback that triggers when any memory is read.
The callback has the same semantics as the one for MemoryReadHookable.hook_memory_read()
Only one global read callback can exist at once. Attempting to register a second will raise an exception.
Multiple global or specific read callbacks can apply to a particular read operation. The order they are invoked is not guaranteed. If a callback modifies the data read, it will override the data passed to any pending callbacks.
Note
Memory read callbacks will trigger if any part of a read overlaps any part of the hooked memory region.
The whole access will be reported to the callback, including bytes outside the hooked region.
Caution
A hooked memory region must be mapped, or the access will trigger an exception before it triggers the callback.
Memory Writes¶
Harnesses or analyses can use MemoryWriteHookable.hook_memory_write()
to register a callback that triggers when a specific address range is written.
This takes the starting address of the memory region,
the size in bytes of the memory region,
and a callback of the form callback(Emulator, int, int, bytes) -> None.
The callback receives the start address of the write, the size of the write,
and the data that will be written as a bytes object.
A write callback can be removed from the emulator using
MemoryWriteHookable.unhook_memory_write().
Harnesses or analyses can use MemoryWriteHookable.hook_memory_writes()
to register a callback that triggers when any memory is written.
The callback has the same semantics as the one for MemoryWriteHookable.hook_memory_write()
Only one global write callback can exist at once. Attempting to register a second will raise an exception.
Multiple global or specific write callbacks can apply to a particular read operation. The order they are invoked is not guaranteed.
Write callbacks can’t modify the data written, but they can store their own internal state and make it available to a corresponding read callback.
Note
Memory write callbacks will trigger if any part of a write overlaps any part of the hooked memory region.
The whole access will be reported to the callback, including bytes outside the hooked region.
Caution
A hooked memory region must be mapped, or the access will trigger an exception before it triggers the callback.
System Calls¶
Harnesses and models can use SyscallHookable.hook_syscall()
to hook specific system calls encountered by the emulator.
This takes a system call number,
and a callback of the form callback(Emulator) -> None.
(It is assumed the callback was written with a specific syscall in mind.)
Only one callback can be registered for a specific system call number. Attempting to add another will raise an exception.
A callback for a specific system call can be removed using
SyscallHookable.unhook_syscall().
Harnesses and models can use SyscallHookable.hook_syscalls()
to hook all system calls.
This takes a callback of the form callback(Emulator, int) -> None.
The callback receives the system call number as its second parameter.
Only one callback can be registered for all system calls. Attempting to add another will raise an exception.
Global system call hooks can be removed using
SyscallHookable.unhook_syscalls().
Emulating an unhooked system call will cause the emulator to raise an exception.
By default, emulating a hooked system call will have no effect; the emulator will continue at the instruction immediately following the system call.
The callback can modify machine state freely, and cause emulation to resume at a different instruction by setting the program counter.
Both global and specific system call handlers can be registered at the same time. The order in which the callbacks fire is not guaranteed.
Interrupts¶
Warning
SmallWorld has an interface for interrupt hooking, but it appears to be non-functional right now.