smallworld.analyses

class smallworld.analyses.Analysis(hinter: Hinter, *args, **kwargs)

An analysis that emits some information about some code, possibly to help with harnessing.

abstractmethod run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

abstract property description: str

A description of this analysis.

Descriptions should be a single sentence, lowercase, with no final punctuation for proper formatting.

abstract property name: str

The name of this analysis.

Names should be kebab-case, all lowercase, no whitespace for proper formatting.

abstract property version: str

The version string for this analysis.

We recommend using Semantic Versioning

class smallworld.analyses.FieldDetectionAnalysis(platform: Platform)

Detect fields on full path exploration

property emulator: Emulator

The emulator to run Underlays need the overlay to define the emulator.

execute()

Exercise the emulator

run(machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.ForcedFieldDetectionAnalysis(platform: Platform, trace: List[Dict[str, int]])
property emulator: Emulator

The emulator to run Underlays need the overlay to define the emulator.

execute()

Exercise the emulator

run(machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.ForcedExecution(platform: Platform, trace: List[Dict[str, int]])

Forced execution using angr

This allows you to emulate arbitrary program slices by forcing the emulator to visit specific instructions, ignoring the normal program control flow.

NOTE: This is not compatible with all architectures. The architecture needs to support single-stepping; delay slot architectures such as MIPS can’t be single-stepped by angr.

Parameters:
  • platform – The platform you want to emulate

  • trace – The list of program counter addresses you want to visit

property emulator: Emulator

The emulator to run Underlays need the overlay to define the emulator.

run(machine: Machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

execute()

Exercise the emulator

class smallworld.analyses.CrashTriage(*args, max_steps: int = -1, **kwargs)

An analysis to ID potential causes of a crash.

Concrete emulators like Unicorn and Panda are great at finding crashes, but they don’t preserve enough data to really explain why a crash happened. You got a null pointer dereference. Okay, why? Where did that null pointer come from? What was the program doing to cause it?

On the other hand, angr’s symbolic executor can provide extremely detailed machine-readable information about how the code and data reached a particular state, but getting it to identify crash conditions isn’t always easy.

The solution is a dual harness approach:

  • Run the harness once in Unicorn to get an immediate cause and execution trace

  • Run the harness again in angr to replay the execution trace.

  • Examine the final state from angr to get a diagnosis for the crash.

Here are the kinds of hints produced by this analysis:

TriageNormalExit:

Execution exited at an exit point, indicating there was no crash.

TriageNormalTooLong:

Execution exceeded the specified maximum number of steps without encountering a crash.

TriageOOB:

Execution went out of bounds, or left mapped memory.

TriageIllegal:

Execution encountered an illegal instruction

TriageTrap:

Execution halted because of an unhandled hardware interrupt/trap/exception/fault

TriageMemory:

Execution halted because of an illegal memory access. This will include a field specifying the kind of access: read, write, or fetch.

Each hint contains the execution trace it followed, as well as a diagnosis. The diagnosis will either contain details about the crash, or indicate that angr stopped before reaching it.

run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.CrashTriagePrinter(hinter: Hinter, *args, **kwargs)
run(machine: Machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.CrashTriageVerification(*args, max_steps: int = -1, hint_type: Type[TriageHint] | None = None, hint_attrs: Dict[str, Any] | None = None, diagnosis_type: Type[Diagnosis] | None = None, halt_type: Type[Halt] | None = None, halt_kind: str | None = None, halt_target: str | None = None, illegal_type: Type[IllegalInstr] | None = None, mem_access: MemoryAccess | None = None, **kwargs)

Verification hooking for CrashTriage.

This confirms that a use of the crash triage analyzer produces a specific equivalence class of cause and diagnosis.

It’s mostly useful for testing purposes; I didn’t want to rewrite this in each of the integration test scripts.

run(machine: Machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.Colorizer(*args, exec_id: int, num_insns: int = 200, min_color: int = 128, **kwargs)

A simple kind of data flow analysis via tracking distinct values (colors) and employing instruction use/def analysis

We run a single micro-execution of the code, given the input (to run method) machine state, single-stepping instructions and interposing for analysis before and after each instruction to check values dynamically read / written by each instruction. We maintain a “colors” map from these dynamic values to when/where we first observed them. This map is initially empty. Before emulating an instruction, we examine the values (registers and memory) it will read. If any are not in the colors map, that is the initial sighting of that value and we emit a hint to that effect and add a color to the map. If any color is already in the map, then that is a def-use flow from the time or place at which that value was first observed to this instruction. Similarly, after emulating an instruction, we examine every value written to a register or memory. If a value is not in the colors map, it is a new, computed result and we hint about its creation and add it to the map. If it is in the colors map, we do nothing since it just a copy.

Here are the kinds of hints output by this analysis

DynamicRegisterValueHint – about value in a register at a particular instruction in the trace

DynamicMemoryValueHint – about a value in memory at a particular instruction in the trace

These can be “new” values if that is first time we have seen that color (dynamic value). Or they can be not-new, meaning this is a use of that value previously observed, i.e. a data flow.

They can also be reads or writes.

Note: Why is there a min_color in constructor to Colorizer? A color (here) is just a dynamic value that we think might be kinda unique and thus we can intuit data flows when we see it used in multiple places. Obviously, a color of 0 is not helpful. If you see 0 in two places it’s unlikely that means there was a dataflow. But this begs the question: what is a reasonable minimum acceptable color for intuiting data flows? 0-10 seems like they can’t be good colors? Here, our default value for min color is 0x80: this is fairly conservative and can be lowered. Note that generally, we use randomize_unitialized, above, to set 2, 4, and 8-byte registers and memory lvals to random numbers that will work well as colors. These are unlikely to be < 0x80.

Parameters:
  • exec_id – An integer used to identify this execution, if needed

  • num_insns – The number of instructions to micro-execute

  • min_color – see above, min dyn value to be a color

run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.ColorizerSummary(*args, **kwargs)
run(machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.ColorizerReadWrite(*args, **kwargs)

Collect color hints from Colorizer and turn them into a kind of dynamic def->use (write->read) graph that can be used to figure out how a value at an instruction is derived from inputs.

How does this work?

Well, we first collect all the dynamic value hints output by the colorizer.

The “new” ones are hints in which a dynamic value is first observed along an execution trace. These are per-micro-execution colors. They are normalized across micro-executions to get a set of keys that correspond to colors. This normalization is done by compute_dv_key. And the mapping from dynamic value hint to key via that function can be mapped, further, to unique colors with dvkey2num[key].

Note: these “new” hints can actually be reads or writes. A read can be use of a register value or a read of a value from memory. Both of these could contain new values and thus new per-execution colors. But a write, which can be to a register or to memory, can also be a new value (if computation occured in the instruction to get a new value).

Next, we find edges in the graph as between where a new value was first seen and any place it was seen to be used. The edges are between the keys described above and thus are normalized across micro executions.

run(machine)

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.TraceExecution(*args, num_insns: int, seed: int = 1234567, **kwargs)
run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.TraceElement(pc: int, ic: int, mnemonic: str, op_str: str, cmp: List[smallworld.instructions.instructions.RegisterOperand | smallworld.instructions.bsid.BSIDMemoryReferenceOperand | int], branch: bool, immediates: List[int])
class smallworld.analyses.TraceRes(*values)
class smallworld.analyses.LoopDetection(*args, min_iter: int = 2, **kwargs)
run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.

class smallworld.analyses.CoverageFrontier(*args, **kwargs)
run(machine: Machine) None

Run the analysis.

This function should not modify the provided Machine. Instead, it should be coppied before modification.

Parameters:

machine – A machine state object on which this analysis should run.