Program Paths

A path through a program is, at its core, a sequence of basic blocks representing what was executed since the program started. These blocks in the paths can repeat (in the case of loops) and a program can have a near-infinite amount of paths (for example, a program with a single branch will have two paths, a program with two branches nested within each other will have 4, and so on).

angr represents a path by using the Path class. The Path class will be used in most interactions with angr and in most of angr's analyses. To create a blank path, do:

# load a binary

>>> import angr
>>> b = angr.Project('/bin/true')

# load the path
>>> p = b.factory.path()

# this is the address that the path is *about to* execute
>>> assert p.addr == b.entry

After this, p is a path representing the program at the entry point. We can see that the callstack and the path's history are blank:

# this is the number of basic blocks that have been analyzed by the path
>>> assert len(p) == 0

# we can also look at the current backtrace of program execution
# contains only the dummy frame for execution start
>>> assert len(p.callstack) == 1
>>> print p.callstack
Backtrace:
Func 0x401410, sp=0x7fffffffffeffd8, ret=0x0

Moving Forward

Of course, we can't be stuck at the entry point forever. call p.step() to run the single block of symbolic execution. We can look at the successors of a path to see where the program goes after this point. p.step() also returns the successors if you'd like to chain calls. Most of the time, a path will have one or two successors. When there are two successors, it usually means the program branched and there are two possible ways forward with execution. Other times, it will have more than two, such as in the case of a jump table.

>>> p.step()
>>> print "The path has", len(p.successors), "successors!"

# each successor is a path, keeping track of an execution history
>>> s = p.successors[0]
>>> assert s.addr_trace[-1] == p.addr

# and, of course, we can drill down further!
# alternate syntax: s.step() returns the same list as s.successors
>>> ss = s.step()[0].step()[0].step()[0]
>>> len(ss.addr_trace.hardcopy) == 4

To efficiently store information about path histories, angr employs a tree structure that resembles the actual symbolic execution tree. You should never have to worry about this, since through the magic of python we provide efficient accessors for information stored in the tree as it pertains to each stored historical property. The one thing you have to know is that this data structure doesn't allow efficient iteration through the historical lists in forward order - only in reverse order, from most recent to oldest. If you need to iterate or access items from these sequences starting from the beginning, you may access the .hardcopy property on them, which will extract the entirety of the property's history as a flat list for you to peruse at leisure.

For example: part of the history of a path is the types of jumps that occur. These are stored (as strings representing VEX exit type enums), in the jumpkinds attribute.

# recall: s is the path created when we stepped forward the initial path once
>>> print s.jumpkinds
<angr.path.JumpkindIter object at 0x7f8161e584d0>

>>> assert s.jumpkinds[-1] == 'Ijk_Call'
>>> print s.jumpkinds.hardcopy
['Ijk_Call']

# Don't do this! This will throw an exception
>>> # for jk in ss.jumpkinds: print jk

# Do this instead:
>>> for jk in reversed(ss.jumpkinds): print jk
Ijk_Call
Ijk_Call
Ijk_Boring
Ijk_Call

# Or, if you really need to iterate in forward order:
>>> for jk in ss.jumpkinds.hardcopy: print jk
Ijk_Call
Ijk_Boring
Ijk_Call
Ijk_Call

Here is a list of the properties in the path history:

Property Description
Path.addr_trace The addresses of basic blocks that have been executed so far, as integers
Path.trace The SimRun objects that have been executed so far, as strings
Path.targets The targets of the jumps/successors that have been taken so far
Path.guards The guard conditions that had to be satisfied in order to take the branch listed in Path.targets
Path.jumpkinds The type of the exit from each basic block we took, as VEX struct strings
Path.events A log of the events that have happened in symbolic execution
Path.actions A filtering of Path.events to only include the actions taken by the exeution engine. See below.

Here are the different types of jumpkinds:

Type Description
Ijk_Boring A normal jump to an address.
IjK_Call A call to an address.
Ijk_Ret A return.
Ijk_Sig* Various signals.
Ijk_Sys* System calls.
Ijk_NoHook An artificial jumpkind, generated by an angr hook.

Merging Paths

Like states, paths can be merged. Truly understanding this requires concepts that will be explained in future sections, but in a nutshell, we can combine two paths that reached the same program point in different ways. For example, let's say that we have a branch:

# step until branch
p = b.factory.path()
p.step()
while len(p.successors) == 1:
    print 'step'
    p = p.successors[0]
    p.step()

print p
#print p.backtrace
branched_left = p.successors[0]
branched_right = p.successors[1]
assert branched_left.addr != branched_right.addr

after_branched_left = branched_left.step()[0]
after_branched_right = branched_right.step()[0]
assert after_branched_left.addr == after_branched_right.addr

# this will merge both branches into a single path. Values in memory and registers
# will hold any possible values they could have held in either path.
merged = branched_left.merge(branched_right)

assert merged.addr == branched_left.addr and merged.addr == branched_right.addr

Paths can also be unmerged later.

merged_successor = merged.step()[0].step()[0]
unmerged_paths = merged_successor.unmerge()

assert len(unmerged_paths) == 2
assert unmerged_paths[0].addr == unmerged_paths[1].addr

Non-entry point start

Sometimes, you might want to start the analysis of a program partway through the program. For example, you might be interested in what a specific part of a function does, but don't know how to (or don't want to) guide a path to that point. To handle this, we allow the creation of a path at any point in the program:

>>> st = b.factory.blank_state(addr=0x800f000)
>>> p = b.factory.path(st)

>>> assert p.addr == 0x800f000

At this point, all memory, registers, and so forth of the path are blank. We'll explore what this means, and its implications, in future sections.

Semantic Actions

SimuVEX exposes the actions of a basic blocks through the concept of "actions". An action has an associated type (i.e., "mem" for memory, "reg" for registers, "tmp" for temps), and action ("read", "write").

Here is an example interaction with the actions:

>>> p = b.factory.path().step()[0]

>>> for a in p.actions.hardcopy:
...     if a.type == 'mem':
...         print "Memory write to", a.addr.ast
...         print "... address depends on registers", a.addr.reg_deps, "and temps", a.addr.tmp_deps
...         print "... data is:", a.data.ast
...         print "... data depends on registers", a.data.reg_deps, "and temps", a.data.tmp_deps
...         if a.condition is not None:
...             print "... condition is:", a.condition.ast
...         if a.fallback is not None:
...             print "... alternate write in case of condition fail:", a.fallback.ast
...     elif a.type == 'reg':
...         print 'Register write to registerfile offset', a.offset
...     elif a.type == 'tmp':
...         print 'Tmp write to tmp', a.tmp