OOanalyzer - A big help for reversing object oriented binaries

Pharos publishes a number of utilities, one of which is OOanalyzer. This tool uses prolog to build inferences that help reconstruct objects. At the moment this tool only works on 32 bit Windows applications. Still, let's try it out:

The easiest way to get going with this is to use their provided Docker image. Going roughly by the documentation,

$ sudo apt install docker.io
$ sudo docker pull seipharos/pharos
$ mkdir /chose/your/path/hostdir
$ sudo docker run --rm -it -v /chose/your/path/hostdir:/dir seipharos/pharos

will launch an interactive session in which the host directory /chose/your/path/hostdir is mapped to /dir inside the container.

OOanalyzer is then found in /usr/local/bin and more tools and files can be found in /root/pharos

There are many sample executables shipped to test the functionality with under /root/pharos/tests

For example

~# /usr/local/bin/ooanalyzer /root/pharos/tests/ooex_vs2010/Release/oo.exe --json oo.exe.json
OPTI[INFO ]: Analyzing executable: /root/pharos/tests/ooex_vs2010/Release/oo.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: ROSE stock partitioning took 1.94562 seconds.
OPTI[INFO ]: Partitioned 3051 bytes, 980 instructions, 321 basic blocks, 0 data blocks and 90 functions.
OPTI[INFO ]: Pharos function partitioning took 2.18172 seconds.
OPTI[INFO ]: Partitioned 4096 bytes, 1104 instructions, 363 basic blocks, 14 data blocks and 108 functions.
OPTI[INFO ]: Function analysis complete, analyzed 56 functions in 3.39674 seconds.
OPTI[INFO ]: OOAnalyzer analysis complete, found: 3 classes, 8 methods, 0 virtual calls, and 0 usage instructions.
OPTI[INFO ]: Successfully exported to JSON file 'oo.exe.json'.
OPTI[INFO ]: OOAnalyzer analysis complete.

This created a oo.exe.json file that can be imported into IDA or Ghidra - more on that later.

I found, however, that on some real-world executables the default settings fail to work. Let's have a look at V9.exe, an MFC executable about 19 MB large.

~# /usr/local/bin/ooanalyzer /dir/V9.exe --json V9.exe.json
OPTI[INFO ]: Analyzing executable: /dir/V9.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: ROSE stock partitioning took 1899.79 seconds.
OPTI[INFO ]: Partitioned 4425314 bytes, 1156333 instructions, 261239 basic blocks, 536 data blocks and 18759 functions.
OOAN[FATAL]: Partitioner absolute memory exceeded: 7391.06 secs CPU, 8000.04 MB memory, 7399.29 secs elapsed
OOAN[FATAL]: Exiting prematurely, increase --partitioner-timeout and try again.

additionally, there is another parameter, --log, which helps diagnose any problems - more on that later. Thus, to process our file, we increase the defaults:

/usr/local/bin/ooanalyzer /dir/V9.exe --json V9.json --timeout 1000000 --maximum-memory 1500000 --partitioner-timeout 1000000 --log='APID(all)'

and after a very, very long time, it finishes - even though a number of error messages show up in the log.

FSEM[ERROR]: Analysis of function 0x0079D90E failed: relative CPU time exceeded
OPTI[INFO ]: Function analysis complete, analyzed 49805 functions in 36946.6 seconds.
PLOG[ERROR]: OOAnalyzer has been running for over an hour in Prolog mode.  We have found that for, complex executables, SWI Prolog often outperforms XSB Prolog.  You may wish to dump the .facts file for your executable using the -F option of ooanalyzer, and then run the oodebugrun-swipl script in share/pharos/oorules of your build directory.  You will need to install swipl and ensure it is on your PATH.
OPTI[INFO ]: OOAnalyzer analysis complete, found: 1330 classes, 7818 methods, 121 virtual calls, and 17326 usage instructions.
OPTI[INFO ]: Successfully exported to JSON file 'V9alt.json'.
OPTI[INFO ]: OOAnalyzer analysis complete.

root@8f638bf7cf9a:/dir# 

Since we get a warning about prolog performance, we would like to try it the other way (XSB prolog), but the software authors informed me that they currently don't support creating the json file for the plugin from XSB, so we have to stay with this for now.

There are a number of warnings/errors output by the software. For example, ooanalyzer does not know about setupapi.dll

APID[TRACE]: API Lookup: SETUPAPI:SetupDiGetClassDevsA
APID[WHERE]: JSON API database /usr/local/share/pharos/apidb/pharos-api-additions.json has no data for DLL: SETUPAPI
APID[WHERE]: SQLite API database /usr/local/share/pharos/apidb/pharos-apidb.sqlite has no data for DLL: SETUPAPI
APID[WHERE]: Decorated name parser has no data for DLL: SETUPAPI
APID[WARN ]: API database has no data for DLL: SETUPAPI
APID[TRACE]: API Lookup: SETUPAPI:SetupDiEnumDeviceInterfaces

but ooanalyzer does provide a mechanism to teach it. The JSON API database /usr/local/share/pharos/apidb/pharos-api-additions.json .

The default file has got an example:

{ "config": { "exports": [
{
"dll": "OBSCURE32.DLL",
"export_name": "SomeFunction",
"display_name": "SomeFunction",
"convention": "stdcall",
"parameters": [
{"name": "dwFirstParam", "type": "DWORD", "inout": "in"}
],
"type": "void",
"ordinal": 123
}
]}}

so we can use wine to help ooanalyzer along a bit:

http://www.mit.edu/afs.new/sipb/project/wine/arch/i386_rhel4/lib/wine/libsetupapi.def

with a little script

#!/bin/bash
filename='libsetupapi.def'
echo "{ \"config\": { \"exports\": [" > setupapi.json
while read line; do 
  echo $line
 echo '{"dll": "setupapi.dll",' >> setupapi.json
   name=$(echo $line|awk '{ print $1}') # extract function name
 name="${name%%@*}"   # strip extra
   echo '"export_name": "'$name'",' >> setupapi.json
   echo '"display_name": "'$name'",' >> setupapi.json
   echo '"convention": "stdcall",' >> setupapi.json
   echo '"parameters": [' >> setupapi.json
 delta=$(echo $line|awk '{ print $1}') # extract function delta
 delta="${delta#*@}"
   echo '   {"delta": "'$delta'"}' >> setupapi.json
   echo '],' >> setupapi.json
   echo '"type": "UnknownReturn",' >> setupapi.json
 ordinal=$(echo $line|awk '{ print $2}') # extract function ordinal
 echo $ordinal
 ordinal="${ordinal#*@}"   # strip extra
   echo '"ordinal": '$ordinal'' >> setupapi.json
      echo '},' >> setupapi.json
done < $filename
truncate -s-1 setupapi.json #remove last comma
truncate -s-1 setupapi.json #remove last comma
echo "" >> setupapi.json
echo "]}}" >> setupapi.json

we just need to replace pharos-api-additions.json with setupapi.json

and then do the same thing with the other dlls that are missing, combining all of them together into a single pharos-api-additions.json

The MFC42.dll json needs to be generated a bit differently because of the mangled names and object oriented nature of the calls:

#!/bin/bash
filename='MFC42.DEF'
echo "{ \"config\": { \"exports\": [" > mfc42.json
while read line; do 
    if [[ $line != \;* ]]
       then
  echo $line
 mangled=$(echo $line|awk '{ print $1}')
 #echo $mangled
 ordinal=$(echo $line|awk '{ print $3}')
 #echo $ordinal
 demangled=`apilookup --pretty=4 --json=-  mfc42.dll:"$mangled"`
 #echo "$demangled"
 ordinalInserted=`echo "$demangled" | sed 's/\"export_name\":/"ordinal\": '$ordinal' ,\n\"export_name\":/'`
 ordinalInserted=`echo "$ordinalInserted" | sed 's/\"dll\": \"mfc42\"/\"dll\": \"MFC42.DLL\"/'`
 ordinalInserted="${ordinalInserted:1}" # strip first char
 ordinalInserted="${ordinalInserted::-1}" # strip last char
 echo "$ordinalInserted" >> mfc42.json
 echo "$ordinalInserted"
 echo "," >> mfc42.json
       fi
done < $filename
truncate -s-1 mfc42.json #remove last comma
truncate -s-1 mfc42.json #remove last comma
echo "]}}" >> mfc42.json

Update: the winmm.json, winspool.json, mfc42.json and setupapi.json I made have now been merged to the contrib folder of the project, so they can be used just by --apidb contrib/winspool.json etc.

The --json option (using XSB prolog) usually works ok, but in my test case here it eventually succumbs to some error and it's a wontfix :https://github.com/cmu-sei/pharos/issues/90

Update: the --json option should now work as the project moved to using SWI prolog by default.

So we have to create a .facts file first and then use SWI prolog to move on.

Also, ooanalyzer does not always find the new and delete method addresses, so we have to pass those manually https://github.com/cmu-sei/pharos/issues/85

First:

$ /usr/local/bin/ooanalyzer ./V9.exe --timeout 1000000 --per-function-timeout 1000000 --partitioner-timeout 1000000 --maximum-memory 1000000 --per-function-maximum-memory 1000000 --threads 2 --apidb contrib/winmm.json --apidb contrib/winspool.json --apidb contrib/mfc42.json --apidb contrib/setupapi.json --log='APID(all)' --prolog-facts V9.exe.ooanalyzer.facts --new-method 0x8431b2 --delete-method 0x8431ac> status.txt 2>&1

and then as a second step, we sort the .facts file as specified here: https://github.com/cmu-sei/pharos/tree/master/share/prolog/oorules

awk -F\( '{print $1}' V9.exe.ooanalyzer.facts | sort | uniq -c

And then create the json file

/usr/local/share/pharos/prolog/oorules/oodebugrun V9.exe.ooanalyzer.facts > ooprog.log

from which we now extract the final determinations

grep ^final ooprog.log >ooprog-results_V9.pl

and finally

# /usr/local/share/pharos/prolog/oorules/oojson ooprog-results_V9.pl > V9.ooanalyzer.json

which produces a V9.ooanalyzer.json file that can be opened in the ooanalyzer plugin for Ghidra or IDA

So what does all this buy us?

Here a function inside an executable as decompiled by stock ghidra:

And here the same function after using ooanalyzer

OOanalyzer found that this function is a class member function, determined the hierarchy, changed the call convention to thiscall and labelled the this pointer in the code. This can be very labor saving.

References: