OOanalyzer - A big help for reversing object oriented binaries
Pharos publishes a number of utilities, one of which is OOanalyzer. This tool uses prolog to build inferences that help reconstruct objects. At the moment this tool only works on 32 bit Windows applications. Still, let's try it out:
The easiest way to get going with this is to use their provided Docker image. Going roughly by the documentation,
$ sudo apt install docker.io
$ sudo docker pull seipharos/pharos
$ mkdir /chose/your/path/hostdir
$ sudo docker run --rm -it -v /chose/your/path/hostdir:/dir seipharos/pharos
will launch an interactive session in which the host directory /chose/your/path/hostdir
is mapped to /dir
inside the container.
OOanalyzer is then found in /usr/local/bin
and more tools and files can be found in /root/pharos
There are many sample executables shipped to test the functionality with under /root/pharos/tests
For example
~# /usr/local/bin/ooanalyzer /root/pharos/tests/ooex_vs2010/Release/oo.exe --json oo.exe.json
OPTI[INFO ]: Analyzing executable: /root/pharos/tests/ooex_vs2010/Release/oo.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: ROSE stock partitioning took 1.94562 seconds.
OPTI[INFO ]: Partitioned 3051 bytes, 980 instructions, 321 basic blocks, 0 data blocks and 90 functions.
OPTI[INFO ]: Pharos function partitioning took 2.18172 seconds.
OPTI[INFO ]: Partitioned 4096 bytes, 1104 instructions, 363 basic blocks, 14 data blocks and 108 functions.
OPTI[INFO ]: Function analysis complete, analyzed 56 functions in 3.39674 seconds.
OPTI[INFO ]: OOAnalyzer analysis complete, found: 3 classes, 8 methods, 0 virtual calls, and 0 usage instructions.
OPTI[INFO ]: Successfully exported to JSON file 'oo.exe.json'.
OPTI[INFO ]: OOAnalyzer analysis complete.
This created a oo.exe.json file that can be imported into IDA or Ghidra - more on that later.
I found, however, that on some real-world executables the default settings fail to work. Let's have a look at V9.exe
, an MFC executable about 19 MB large.
~# /usr/local/bin/ooanalyzer /dir/V9.exe --json V9.exe.json
OPTI[INFO ]: Analyzing executable: /dir/V9.exe
OPTI[INFO ]: OOAnalyzer version 1.0.
OPTI[INFO ]: ROSE stock partitioning took 1899.79 seconds.
OPTI[INFO ]: Partitioned 4425314 bytes, 1156333 instructions, 261239 basic blocks, 536 data blocks and 18759 functions.
OOAN[FATAL]: Partitioner absolute memory exceeded: 7391.06 secs CPU, 8000.04 MB memory, 7399.29 secs elapsed
OOAN[FATAL]: Exiting prematurely, increase --partitioner-timeout and try again.
additionally, there is another parameter, --log, which helps diagnose any problems - more on that later. Thus, to process our file, we increase the defaults:
/usr/local/bin/ooanalyzer /dir/V9.exe --json V9.json --timeout 1000000 --maximum-memory 1500000 --partitioner-timeout 1000000 --log='APID(all)'
and after a very, very long time, it finishes - even though a number of error messages show up in the log.
FSEM[ERROR]: Analysis of function 0x0079D90E failed: relative CPU time exceeded
OPTI[INFO ]: Function analysis complete, analyzed 49805 functions in 36946.6 seconds.
PLOG[ERROR]: OOAnalyzer has been running for over an hour in Prolog mode. We have found that for, complex executables, SWI Prolog often outperforms XSB Prolog. You may wish to dump the .facts file for your executable using the -F option of ooanalyzer, and then run the oodebugrun-swipl script in share/pharos/oorules of your build directory. You will need to install swipl and ensure it is on your PATH.
OPTI[INFO ]: OOAnalyzer analysis complete, found: 1330 classes, 7818 methods, 121 virtual calls, and 17326 usage instructions.
OPTI[INFO ]: Successfully exported to JSON file 'V9alt.json'.
OPTI[INFO ]: OOAnalyzer analysis complete.
root@8f638bf7cf9a:/dir#
Since we get a warning about prolog performance, we would like to try it the other way (XSB prolog), but the software authors informed me that they currently don't support creating the json file for the plugin from XSB, so we have to stay with this for now.
There are a number of warnings/errors output by the software. For example, ooanalyzer does not know about setupapi.dll
APID[TRACE]: API Lookup: SETUPAPI:SetupDiGetClassDevsA
APID[WHERE]: JSON API database /usr/local/share/pharos/apidb/pharos-api-additions.json has no data for DLL: SETUPAPI
APID[WHERE]: SQLite API database /usr/local/share/pharos/apidb/pharos-apidb.sqlite has no data for DLL: SETUPAPI
APID[WHERE]: Decorated name parser has no data for DLL: SETUPAPI
APID[WARN ]: API database has no data for DLL: SETUPAPI
APID[TRACE]: API Lookup: SETUPAPI:SetupDiEnumDeviceInterfaces
but ooanalyzer does provide a mechanism to teach it. The JSON API database /usr/local/share/pharos/apidb/pharos-api-additions.json
.
The default file has got an example:
{ "config": { "exports": [
{
"dll": "OBSCURE32.DLL",
"export_name": "SomeFunction",
"display_name": "SomeFunction",
"convention": "stdcall",
"parameters": [
{"name": "dwFirstParam", "type": "DWORD", "inout": "in"}
],
"type": "void",
"ordinal": 123
}
]}}
so we can use wine to help ooanalyzer along a bit:
http://www.mit.edu/afs.new/sipb/project/wine/arch/i386_rhel4/lib/wine/libsetupapi.def
with a little script
#!/bin/bash
filename='libsetupapi.def'
echo "{ \"config\": { \"exports\": [" > setupapi.json
while read line; do
echo $line
echo '{"dll": "setupapi.dll",' >> setupapi.json
name=$(echo $line|awk '{ print $1}') # extract function name
name="${name%%@*}" # strip extra
echo '"export_name": "'$name'",' >> setupapi.json
echo '"display_name": "'$name'",' >> setupapi.json
echo '"convention": "stdcall",' >> setupapi.json
echo '"parameters": [' >> setupapi.json
delta=$(echo $line|awk '{ print $1}') # extract function delta
delta="${delta#*@}"
echo ' {"delta": "'$delta'"}' >> setupapi.json
echo '],' >> setupapi.json
echo '"type": "UnknownReturn",' >> setupapi.json
ordinal=$(echo $line|awk '{ print $2}') # extract function ordinal
echo $ordinal
ordinal="${ordinal#*@}" # strip extra
echo '"ordinal": '$ordinal'' >> setupapi.json
echo '},' >> setupapi.json
done < $filename
truncate -s-1 setupapi.json #remove last comma
truncate -s-1 setupapi.json #remove last comma
echo "" >> setupapi.json
echo "]}}" >> setupapi.json
we just need to replace pharos-api-additions.json
with setupapi.json
and then do the same thing with the other dlls that are missing, combining all of them together into a single pharos-api-additions.json
The MFC42.dll json needs to be generated a bit differently because of the mangled names and object oriented nature of the calls:
#!/bin/bash
filename='MFC42.DEF'
echo "{ \"config\": { \"exports\": [" > mfc42.json
while read line; do
if [[ $line != \;* ]]
then
echo $line
mangled=$(echo $line|awk '{ print $1}')
#echo $mangled
ordinal=$(echo $line|awk '{ print $3}')
#echo $ordinal
demangled=`apilookup --pretty=4 --json=- mfc42.dll:"$mangled"`
#echo "$demangled"
ordinalInserted=`echo "$demangled" | sed 's/\"export_name\":/"ordinal\": '$ordinal' ,\n\"export_name\":/'`
ordinalInserted=`echo "$ordinalInserted" | sed 's/\"dll\": \"mfc42\"/\"dll\": \"MFC42.DLL\"/'`
ordinalInserted="${ordinalInserted:1}" # strip first char
ordinalInserted="${ordinalInserted::-1}" # strip last char
echo "$ordinalInserted" >> mfc42.json
echo "$ordinalInserted"
echo "," >> mfc42.json
fi
done < $filename
truncate -s-1 mfc42.json #remove last comma
truncate -s-1 mfc42.json #remove last comma
echo "]}}" >> mfc42.json
Update: the winmm.json
, winspool.json
, mfc42.json
and setupapi.json
I made have now been merged to the contrib folder of the project, so they can be used just by --apidb contrib/winspool.json
etc.
The --json option (using XSB prolog) usually works ok, but in my test case here it eventually succumbs to some error and it's a wontfix :https://github.com/cmu-sei/pharos/issues/90
Update: the --json option should now work as the project moved to using SWI prolog by default.
So we have to create a .facts
file first and then use SWI prolog to move on.
Also, ooanalyzer does not always find the new
and delete
method addresses, so we have to pass those manually https://github.com/cmu-sei/pharos/issues/85
First:
$ /usr/local/bin/ooanalyzer ./V9.exe --timeout 1000000 --per-function-timeout 1000000 --partitioner-timeout 1000000 --maximum-memory 1000000 --per-function-maximum-memory 1000000 --threads 2 --apidb contrib/winmm.json --apidb contrib/winspool.json --apidb contrib/mfc42.json --apidb contrib/setupapi.json --log='APID(all)' --prolog-facts V9.exe.ooanalyzer.facts --new-method 0x8431b2 --delete-method 0x8431ac> status.txt 2>&1
and then as a second step, we sort the .facts file as specified here: https://github.com/cmu-sei/pharos/tree/master/share/prolog/oorules
awk -F\( '{print $1}' V9.exe.ooanalyzer.facts | sort | uniq -c
And then create the json file
/usr/local/share/pharos/prolog/oorules/oodebugrun V9.exe.ooanalyzer.facts > ooprog.log
from which we now extract the final determinations
grep ^final ooprog.log >ooprog-results_V9.pl
and finally
# /usr/local/share/pharos/prolog/oorules/oojson ooprog-results_V9.pl > V9.ooanalyzer.json
which produces a V9.ooanalyzer.json file that can be opened in the ooanalyzer plugin for Ghidra or IDA
So what does all this buy us?
Here a function inside an executable as decompiled by stock ghidra:
And here the same function after using ooanalyzer
OOanalyzer found that this function is a class member function, determined the hierarchy, changed the call convention to thiscall and labelled the this pointer in the code. This can be very labor saving.
References:
- Automated static analysis tools for binary programs - https://github.com/cmu-sei/pharos
- Ghidra, A software reverse engineering (SRE) suite of tools developed by NSA's Research Directorate - https://ghidra-sre.org/