CleanTricks to deal with dirty tricks - binary ninja deobfuscation plugin

There are many many obfuscation techniques to make binary code analysis laborious. Here I am showing some common ones, how to write a binary ninja plugin to deal with them and leave a template to add more. Find it here: https://github.com/janbbeck/CleanTricks

Overlapping code

Traditionally, a very effective way to obfuscate code is overlapping code. This is not a new technique, but traditional disassemblers have struggled with this. Lets look at an example in ghidra.

The XOR will always set the zero flag, and thereby trigger the JZ jump. Next, see how the JZ instructions jumps to 0x4011be+2. This is in the middle of the MOV R11, 0x5ebcbff49c3ff49 instruction! We can un-/re-define this code and see what the overlapping instructions are:

Interesting. So it is clear that the affected bytes represent two different instruction sequences, depending on where the program counter is. Neither ghidra, nor IDA pro nor any other disassembler I am aware of can gracefully deal with this - except for binary ninja. Lets look at that code section in binary ninja:

Both code variations are recognized and represented in the disassembly. Of course, while the code here uses r11, the technique will use with any register. The overlapping code is hidden in the literal value, 0x5ebcbff49c3ff49. It simply increases r11, then decreases it again, then jumps to the address right behind the JZ check. I have come across a binary that uses this same trick on different registers thousands of times. So I made a binary ninja plugin that deals with this in an automated way and can be easily adjusted/augmented to deal with other obfuscations as I come across them. The basic structure of the plugin consists of two parts. First, a registration for the plugin menu, second a function to be called when the menu is selected. I started with a clean template, and left it in the plugin, to be able to easily and quickly add a new deobfuscation. The template - it only iterates over all the instructions in the program - looks like this:

from binaryninja import *
from operator import itemgetter
import time

def clean_tricks_template(bv):
         # empty template to play with
         start = time.time()
         # sort instructions by address. BN does not provide this in order. Also, there are duplicates...
         instructionList = sorted(bv.instructions, key=itemgetter(1)) 
         # or instructionList = bv.instructions   , because sometimes listing by block is more useful
         last_instruction = None    # keep track of the instructions before the one under current consideration
         last_instruction2 = None   # keep track of the instructions before the one under current consideration
         for instruction in instructionList:
            
            last_instruction2 = last_instruction    # keep track of the instructions before the one under current consideration
            last_instruction = instruction          # keep track of the instructions before the one under current consideration
         end = time.time()
         binaryninja.log_info(repr(end-start))

PluginCommand.register("Clean Tricks\\0 - Empty template", "Empty template to experiment with", clean_tricks_template)

The PluginCommand.register creates the menu entry and the function clean_tricks_template gets called when the menu entry is selected.

To deal with the overlapping code above, we only need to NOP out the relevant bytes.

Note how the inc/dec instruction is really the immediate value placed into r11 with the mov instruction. If we NOP those out, the program already becomes simpler:

Next, we recognize that the execution proceeds from 0x4011cb to 0x4011c0, then does nothing but to go to a jump instruction to 0x4011cd:

Both code paths do the same thing, so we might as well NOP out je 0x4011c0.

Then the original mov instruction is completely pointless and can be NOP'd.

A very clever obfuscator might be able to identify an XOR instruction to construct this obfuscation around it, but more often, it is pointless as well. In that case, the code can NOP it as well, and the resulting deobfuscated section simplifies to the following:

And this is the code that goes through all of the program and cleans up all these obfuscations:

def clean_tricks_mov_xor_je(bv): # mov REG, CODE -> xor REG,REG -> je CODE # careful! xor REG,REG may be legitimate... start = time.time() instructionList = bv.instructions last_instruction = None last_instruction2 = None for instruction in instructionList: if last_instruction2 is not None: if "'mov'" in repr(last_instruction2): if "xor" in repr(last_instruction): if repr(last_instruction[0][2]) == repr(last_instruction[0][4]): # xor has to happen on single register if repr(last_instruction2[0][2]) == repr(last_instruction[0][2]): # mov and xor has to happen on identical register if "je" in repr(instruction): mov_address = last_instruction2[1] # the address of the mov instruction itself target_address = int(repr(instruction[0][2]).strip("'"),16) # the target of the je instruction if target_address-mov_address == 2 : # if we jump right back into the mov instruction bv.write(mov_address, b"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90") # NOP away obfuscating content binaryninja.log_info(repr(last_instruction2)) binaryninja.log_info(repr(last_instruction)) binaryninja.log_info(repr(instruction)) last_instruction2 = last_instruction last_instruction = instruction end = time.time() binaryninja.log_info(repr(end-start)) binaryninja.log_info("Done. Finished in {} seconds.".format(end-start)) binaryninja.log_info("{} patches performed.".format(patchCount))

Useless computation and fake branch.

Another obfuscation technique is to perform some calculations that will never be used other than to decide a branch instruction that always goes in one direction. Consider this:

r11 is pushed on the stack, then zeroed with an xor, then the je will always succeed, because the xor will set the zero flag. Then r11 is restored from the stack. This sequence of instructions can safely be NOPd, except for the je, which needs a jmp:


def clean_tricks_push_xor_je_pop(bv): #push REG, xor REG,REG, je, pop REG start = time.time() patchCount = 0 instructionList = sorted(bv.instructions, key=itemgetter(1)) # sort instructions by address. BN does not provide this in order last_instruction = None last_instruction2 = None last_instruction3 = None last_instruction4 = None for instruction in instructionList: if last_instruction2 is not None: if repr(instruction) == repr(last_instruction): # ignore repeated/duplicate code blocks continue if repr(instruction) == repr(last_instruction2): # ignore repeated/duplicate code blocks continue if "'push'" in repr(last_instruction2): if "'xor'" in repr(last_instruction): if "je" in repr(instruction): if repr(last_instruction[0][2]) == repr(last_instruction[0][4]): # xor has to happen on single register if repr(last_instruction[0][2]) == repr(last_instruction2[0][2]): # push xor registers have to match target_address = int(repr(instruction[0][2]).strip("'"),16) # the target of the je instruction target_instruction = bv.get_disassembly(target_address) if "pop" in target_instruction: # has to be a pop instruction if repr(last_instruction2[0][2]).strip("'") in target_instruction: # push and pop registers have to match binaryninja.log_info(repr(last_instruction2)) binaryninja.log_info(repr(last_instruction)) binaryninja.log_info(repr(instruction)) binaryninja.log_info(repr(bv.get_disassembly(target_address))) patchCount = patchCount + 1 bv.convert_to_nop(last_instruction2[1]) bv.convert_to_nop(last_instruction[1]) bv.always_branch(instruction[1]) # jump always bv.convert_to_nop(target_address) last_instruction4 = last_instruction3 last_instruction3 = last_instruction2 last_instruction2 = last_instruction last_instruction = instruction end = time.time() binaryninja.log_info(repr(end-start)) binaryninja.log_info("Done. Finished in {} seconds.".format(end-start)) binaryninja.log_info("{} patches performed.".format(patchCount))

And here is what this looks like after deobfuscation:

Useless computation and fake branch 2.

This is another xor based jump obfuscation. However, in this case there is no 'preamble'

This is the Clean Trick for that case:


If both 'useless computation and fake branch' techniques are used, the first one should be cleaned up first. Otherwise the push and pop instructions will remain as a remnant and the second Clean Trick will find no targets.

jmp into an inc/dec pair

In this case there is an inc/dec instruction pair doing nothing. It's overlapping with a preceding jmp instruction:

All 3 instructions have to be NOPd together:

def clean_tricks_jmp_inc_dec(bv): # jmp (overlaps inc instruction) -> inc REG -> dec REG start = time.time() patchCount = 0 instructionList = sorted(bv.instructions, key=itemgetter(1)) # sort instructions by address. BN does not provide this in order last_instruction = None last_instruction2 = None for instruction in instructionList: if last_instruction2 is not None: if repr(instruction[0]) == repr(last_instruction[0]): # ignore repeated/duplicate code blocks continue if repr(instruction[0]) == repr(last_instruction2[0]): # ignore repeated/duplicate code blocks continue if "'jmp'" in repr(last_instruction2): if "inc" in repr(last_instruction): if "dec" in repr(instruction): jmp_address = last_instruction2[1] # the address of the jmp instruction itself target_address = int(repr(last_instruction2[0][2]).strip("'"),16) # the target of the jmp instruction # if we jump right back into the mov instruction if target_address-jmp_address == 1 : bv.write(jmp_address, b"\x90\x90\x90\x90\x90") patchCount = patchCount + 1 binaryninja.log_info(repr(last_instruction2)) binaryninja.log_info(repr(last_instruction)) binaryninja.log_info(repr(instruction)) binaryninja.log_info(repr(target_address-jmp_address)) last_instruction2 = last_instruction last_instruction = instruction end = time.time() binaryninja.log_info("Done. Finished in {} seconds.".format(end-start)) binaryninja.log_info("{} patches performed.".format(patchCount))

plain inc/dec pair

In this case there simply is an inc/dec instruction pair doing nothing.

And this is the Clean Trick:

def clean_tricks_inc_dec(bv): # jmp (overlaps inc instruction) -> inc REG -> dec REG start = time.time() patchCount = 0 instructionList = sorted(bv.instructions, key=itemgetter(1)) # sort instructions by address. BN does not provide this in order last_instruction = None for instruction in instructionList: if last_instruction is not None: if repr(instruction[0]) == repr(last_instruction[0]): # ignore repeated/duplicate code blocks continue if "inc" in repr(last_instruction): if "dec" in repr(instruction): if repr(last_instruction[0][2]) == repr(instruction[0][2]): # inc/dec has to happen on identical register bv.write(last_instruction[1], b"\x90\x90\x90\x90\x90\x90") patchCount = patchCount + 1 binaryninja.log_info(repr(last_instruction)) binaryninja.log_info(repr(instruction)) last_instruction = instruction end = time.time() binaryninja.log_info("Done. Finished in {} seconds.".format(end-start)) binaryninja.log_info("{} patches performed.".format(patchCount))

Split jmp

Another technique is to combine je/jne instructions to the same target address, either with or without a preceding computation.

And this is the Clean Trick:

def clean_tricks_je_jne(bv): # je ADDR -> jne ADDR start = time.time() patchCount = 0 instructionList = sorted(bv.instructions, key=itemgetter(1)) # sort instructions by address. BN does not provide this in order last_instruction = None last_instruction2 = None for instruction in instructionList: if last_instruction2 is not None: if repr(instruction) == repr(last_instruction): # ignore repeated/duplicate code blocks continue if repr(instruction) == repr(last_instruction2): # ignore repeated/duplicate code blocks continue if "je" in repr(last_instruction): if "jne" in repr(instruction): address1 = int(repr(last_instruction[0][2]).strip("'"),16) # the target of the je instruction address2 = int(repr(instruction[0][2]).strip("'"),16) # the target of the jne instruction if address2 > (instruction[1] + 1): # make sure target is at least two bytes further than the jne instruction itself if address1 == address2: patchCount = patchCount + 1 bv.always_branch(last_instruction[1]) start_addr = bv.get_previous_function_start_before(last_instruction[1]) func = bv.get_function_at(start_addr) if func is not None: func.set_user_instr_highlight(last_instruction[1], HighlightStandardColor.BlueHighlightColor) start_addr = bv.get_previous_function_start_before(instruction[1]) func = bv.get_function_at(start_addr) if func is not None: func.set_user_instr_highlight(instruction[1], HighlightStandardColor.BlueHighlightColor) binaryninja.log_info(repr(last_instruction2)) binaryninja.log_info(repr(last_instruction)) binaryninja.log_info(repr(instruction)) last_instruction2 = last_instruction last_instruction = instruction end = time.time() binaryninja.log_info(repr(end-start)) binaryninja.log_info("Done. Finished in {} seconds.".format(end-start)) binaryninja.log_info("{} patches performed.".format(patchCount))

Many more.

This is just a small sample of techniques used to obfuscate code and make static analysis more difficult. I'll add more to CleanTricks over time, and it's easy to add more yourself. If you come across an interesting obfuscation and write a Clean Trick for it, please put in a pull request.

https://github.com/janbbeck/CleanTricks


References: