lighthouse_coverage - an execution tracer for PANDA

It's very common to log program execution for either coverage or full tracing using either a PIN tracer [1,2,3], qemu in user mode [1] or drcov [2,3,4]. However, these tools are fragile with respect to obfuscated or self-modifying code. In those cases, it is desirable to extract a program execution trace directly from a whole-system emulation of a computer, such as PANDA.

There is at least one execution tracer included in PANDA as a plugin to provide csv files that can then be imported into IDA pro using a python script. This is often sufficient. However, Binary Ninja has a some advanced capabilities to deal with overlapping code instructions and so it would be nice to have some means to import the trace information there. And lighthouse has some very powerful functionality to analyze coverage data. And since lighthouse runs on both IDA pro and Binary Ninja, it's a natural program to target PANDA coverage output to.

PANDA documents the structure of plugins here [5].

Let's make a new plugin for PANDA that traces executions and outputs coverage information

At bare-bone do-nothing PANDA plugin with needs the following:

  • a subdirectory in the plugins directory named name_of_plugin
  • create a file name_of_plugin.c or name_of_plugin.cpp
  • define a init_plugin() function
  • define a uninit_plugin() function
  • a Makefile with the one-line content $(PLUGIN_TARGET_DIR)/panda_$(PLUGIN_NAME).so: $(PLUGIN_OBJ_DIR)/$(PLUGIN_NAME).o
  • an entry for the plugin in the file config.panda in the plugins directory (just the name of the plugin)

Thus, I created a subdirectory, lighthouse_coverage, in the plugins directory, added the one-line Makefile and the source file lighthouse_coverage.c :

#include "panda/plugin.h"
bool init_plugin(void *self) { return true;}
void uninit_plugin(void *self) {}

In order to be sure that it works, I added a printf() statement.

#include "panda/plugin.h"
#include "stdio.h"

bool init_plugin(void *self) {
    printf("loaded lighthouse plugin\n");
    return true;
}

void uninit_plugin(void *self) { }

And we have a working plugin, which can be invoked in the same way as any other panda plugin

./panda-system-x86_64 -m 4096 -replay theRecording  -panda lighthouse_coverage
PANDA[core]:initializing lighthouse_coverage
loaded lighthouse plugin
loading snapshot
[ ... ]

At this point the plugin does no useful work. PANDA plugins work by hooking callback functions into events as they occur during execution or playback. Lets hook a callback function into an event when our plugin loads.

#include "panda/plugin.h"

void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) 
{
  // this function gets called right before every basic block is executed
 printf("%#018"PRIx64"\n" , translationBlock->pc);  // print out program counter of basic block
 return 0;
}

bool init_plugin(void *self) {
 panda_cb pcb = { .before_block_exec = before_block_exec };
 panda_register_callback(self, PANDA_CB_BEFORE_BLOCK_EXEC, pcb); // register the callback function above
 return true;
}

void uninit_plugin(void *self) { }

TranslationBlock has a member, pc, that is the program counter of the basic block about to be executed. This yields a list of all the basic block addresses executed:

loading snapshot
... done.
opening nondet log for read :  theRecording-rr-nondet.log
0xffffffff81c01c40
0xffffffff81c00920
0xffffffff81c0096f
0xffffffff81c009c2
0xffffffff81c009d1
0xffffffff81c01c4a
[ ... ]

This is, of course, not sufficient because all processes are going to be intermingled here. To the rescue comes Operating System Introspection (OSI). OSI adds the capability to obtain the process names and thread IDs for each basic block (and much more). So, lets add process names to the block addresses and print everything to an output file.


void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) { // this function gets called right before every basic block is executed if (panda_in_kernel(first_cpu) == 0) // I'm not interested in kernel modules {OsiProc * process = get_current_process(cpuState); // get a reference to the process this TranslationBlock belongs to if (process) { fprintf(outputFile,"\n%s@%#018"PRIx64"", process->name, (translationBlock->pc); free_osiproc(process); // always free unused resources } } return 0;};

And that is basically it. We just add some function prototypes and such to reduce compiler warnings and we have a finished plugin:

#include "panda/plugin.h"// OSI#include "osi/osi_types.h"#include "osi/osi_ext.h"
// function prototypesvoid before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) ;void uninit_plugin(void *self) ;bool init_plugin(void *self) ;
FILE * outputFile = 0; // pointer to output file...

void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) { // this function gets called right before every basic block is executed if (panda_in_kernel(first_cpu) == 0) // I'm not interested in kernel modules { OsiProc * process = get_current_process(cpuState); // get a reference to the process this TranslationBlock belongs to if (process) { fprintf(outputFile,"\n%s@%#018"PRIx64"", process->name, (long unsigned int)(translationBlock->pc)); free_osiproc(process); // always free unused resources } } return;};
bool init_plugin(void *self) { panda_require("osi"); // ensure that OSI is loaded assert(init_osi_api()); // ensure that OSI is loaded outputFile = fopen("lighthouse.out", "w"); // open output file panda_cb pcb = { .before_block_exec = before_block_exec }; panda_register_callback(self, PANDA_CB_BEFORE_BLOCK_EXEC, pcb); // register the callback function above return true;};
void uninit_plugin(void *self) { fclose(outputFile); // close output file};

And we can then call the plugin like any other:

./panda-system-x86_64 -m 4096 -replay '/media/jan/80669BBB669BB080/ch34_1char'  -os linux-64-ubuntu -panda osi -panda osi_linux:kconf_group=ubuntu:5.3.0-28-generic:64 -panda lighthouse_coverage

and we get the following type of output:

$ more lighthouse.out 

gmain@0x00007f299b429bf9
gmain@0x00007f299b429c01
gmain@0x00007f299b445740
gmain@0x00007f299b445748
gmain@0x00007f299b445764
gmain@0x00007f299b44576f
gmain@0x00007f299b429c0d
gmain@0x00007f299cd195c9

[ ... ]

Next, we need to change the lighthouse parser for the 'mod+off' format so that it can take our new mod@address format ( I bolded the relevant code changes). I call this modat.py:


import osimport collections
from ..coverage_file import CoverageFilefrom lighthouse.util.disassembler import disassembler
class ModAtData(CoverageFile): """ A module@address log parser. """
def __init__(self, filepath): super(ModAtData, self).__init__(filepath)
#-------------------------------------------------------------------------- # Public #--------------------------------------------------------------------------
def get_offsets(self, module_name): return self.modules.get(module_name, {}).keys()
#-------------------------------------------------------------------------- # Parsing Routines - Top Level #--------------------------------------------------------------------------
def _parse(self): """ Parse modat coverage from the given log file. """ imagebase = disassembler._bv.start modules = collections.defaultdict(lambda: collections.defaultdict(int)) with open(self.filepath) as f: for line in f: trimmed = line.strip()
# skip empty lines if not len(trimmed): continue
# comments can start with ';' or '#' if trimmed[0] in [';', '#']: continue
module_name, bb_offset = line.rsplit("@", 1) modules[module_name][int(bb_offset, 16)-imagebase] += 1 self.modules = modules

Installation

PANDA:

  • Create a folder, lighthouse_coverage in the PANDA plugins directory
  • Drop this projects' files into that folder
  • modify the config.panda file in the plugins directory to include lighthouse_coverage

Binary Ninja: modat.py needs to be placed into the lighthouse/reader/parsers directory. In the Binary Ninja plugin directory, there should be a file called lighthouse_plugin.py and a folder called lighthouse. Place modat.py there in the relative path lighthouse/reader/parsers

And now we get our payoff: Coverage data collected from the binary within a full system emulation:

References: