Angr hooking - de/recompiling - chainbreaker

The executable that I am going to analyze here is interesting in several ways. First, no attempt at code obfuscation is made and the decompilers from both Ghidra and IDA produce code that can be recompiled with little effort. Also, the program is very short, making it a good example from an academic standpoint. The input is passed via a command line argument and so angr or PinCTF are natural first approaches to find the flag. However, there is just enough complication to prevent this from working and that is what makes this such a nice example to show progressive improvements to the angr input file using hooks. Let's see what the program does:

$ ./chainbreaker
usage: ./chainbreaker SEED

$ ./chainbreaker a
Starting chainbreaker!
ERROR
Seed must be an integer!

$ ./chainbreaker 0
Starting chainbreaker!
Seed 0 requires 96 links

LINK 1          0       ->      ERROR
Invalid chain produced!

$ ./chainbreaker 2
Starting chainbreaker!
Seed 2 requires 96 links

LINK 1          2       ->      5        Sleeping for 5ms
LINK 2          5       ->      20       Sleeping for 20ms
LINK 3          20      ->      119      Sleeping for 119ms
LINK 4          119     ->      426      Sleeping for 426ms
LINK 5          426     ->      1529     Sleeping for 1529ms
LINK 6          1529    ->      8160     Sleeping for 8160ms
[ ...]
LINK 94         2049    ->      16472    Sleeping for 16472ms
LINK 95         16472   ->      -445     Sleeping for 0ms
LINK 96         -445    ->      -1570    Sleeping for 0ms
ERROR
Starting seed doesnt match final output

Ok. The program wants an integer seed, from which it produces a chain link number and then iterates through a series of computations to check if the seed is correct. The usage of the sleep timer precludes PinCTF. It is easy to patch the program to always sleep for 0 time, but different seeds use different link sizes and that alone causes any run time differential due to the final check to drown in the signal. Another natural thing to try is angr. Lets look at the decompiled code:


int __cdecl main(int argc, const char **argv, const char **envp){ unsigned int v4; // [rsp+1Ch] [rbp-64h] char v5; // [rsp+20h] [rbp-60h] char v6; // [rsp+4Fh] [rbp-31h] char v7; // [rsp+50h] [rbp-30h] int inputasint; // [rsp+5Ch] [rbp-24h] int v9; // [rsp+60h] [rbp-20h] int i; // [rsp+64h] [rbp-1Ch] int v11; // [rsp+68h] [rbp-18h] int InputasInt2; // [rsp+6Ch] [rbp-14h]
v9 = 100; if ( argc == 2 ) { puts("Starting chainbreaker!"); std::allocator<char>::allocator(&v6); std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::basic_string(&v5, argv[1], &v6); inputasint = std::__cxx11::stoi((__int64)&v5, 0LL, 0xAu); std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::~basic_string(&v5); std::allocator<char>::~allocator(&v6); InputasInt2 = inputasint; v11 = 21309 * ((inputasint ^ 0x7B) + (inputasint ^ 0x141)) % 100; if ( v11 >= 0 ) { if ( !v11 ) v11 = 10; // if V11 is zero, make it 10 } else { v11 = -v11; // else if v11 is negative, make it positive } printf("Seed %s requires %d links\n\n", argv[1], (unsigned int)v11); for ( i = 0; i <= 99 && i < v11; ++i ) { printf("LINK %d\t\t%d\t->\t", (unsigned int)(i + 1), (unsigned int)InputasInt2); InputasInt2 = parse(inputasint, InputasInt2, i); v4 = InputasInt2; if ( InputasInt2 < 0 ) v4 = 0; printf("%d\t Sleeping for %dms\n", (unsigned int)InputasInt2, v4); std::chrono::duration<long,std::ratio<1l,1000l>>::duration<int,void>(&v7, &v4); std::this_thread::sleep_for<long,std::ratio<1l,1000l>>(&v7); } if ( i == 99 ) { puts("ERROR\nMaximum allowed iterations reached"); exit(1); } if ( inputasint == InputasInt2 && i == v11 ) { puts("You have broken the chain!"); exit(0); } puts("ERROR\nStarting seed doesnt match final output"); } else { puts("usage: ./chainbreaker SEED"); } return 0;}

The 'parse' function looks like this:


__int64 __fastcall parse(int a1, int a2, int a3){ int v4; // [rsp+8h] [rbp-18h] int i; // [rsp+18h] [rbp-8h] int v6; // [rsp+1Ch] [rbp-4h]
v4 = a2; if ( !a2 ) { puts("ERROR\nInvalid chain produced!"); exit(0); } if ( a2 < -4096 || a2 > 4096 ) v4 = -(a2 % 4096); v6 = 1; for ( i = 0; i <= 2; ++i ) v6 ^= v4 << i; return v6 + (v4 ^ (unsigned int)(a1 + a3 - 1)) + a1 - 15;}

It's fairly simple code. Many of the 'std::' calls are compiler generated from higher level string functions the original author used, as described here.

Note that the input gets parsed into a 32 bit integer value. If we can get the computation to run fast enough, this challenge could be solved brute force. Since it's simple to do, first I want to try to recompile the code with some changes to optimize run time. I used Visual Studio for this case:


#include "stdafx.h"
int invalid_chain = 0;int ii =0;
__int64 __fastcall parse(int a1, int a2, int a3){ int v4; // [rsp+8h] [rbp-18h] int i; // [rsp+18h] [rbp-8h] int v6; // [rsp+1Ch] [rbp-4h]
v4 = a2; if ( !a2 ) { invalid_chain = 1; // Invalid chain produced return 0; } if ( a2 < -4096 || a2 > 4096 ) v4 = -(a2 % 4096); v6 = 1; for ( i = 0; i <= 2; ++i ) v6 ^= v4 << i; return v6 + (v4 ^ (unsigned int)(a1 + a3 - 1)) + a1 - 15;}
int _tmain(int argc, _TCHAR* argv[])//int __cdecl main(int argc, const char **argv, const char **envp){ unsigned int v4; // [rsp+1Ch] [rbp-64h] char v5; // [rsp+20h] [rbp-60h] char v6; // [rsp+4Fh] [rbp-31h] char v7; // [rsp+50h] [rbp-30h] int v8; // [rsp+5Ch] [rbp-24h] int v9; // [rsp+60h] [rbp-20h] int i; // [rsp+64h] [rbp-1Ch] int v11; // [rsp+68h] [rbp-18h] int v12; // [rsp+6Ch] [rbp-14h]
v9 = 100;
for (int counter=0x00000000;counter<=0xffffffff;counter++) { v8 = counter; v12 = v8; v11 = 0x533D * ((v8 ^ 0x7B) + (v8 ^ 0x141)) % 100; if ( v11 >= 0 ) { if ( !v11 ) v11 = 10; } else { v11 = -v11; } if (0 == (counter % 65536)) // limit the amount of information going to console printf("Seed %8.8x requires %d links\n", counter, (unsigned int)v11); for ( i = 0; i <= 99 && i < v11; ++i ) { v12 = parse(v8, v12, i); if (invalid_chain== 1) continue; // if the parse() function determines an invalid chain, try next loop value v4 = v12; if ( v12 < 0 ) v4 = 0; } if ( i == 99 ) { continue; // Maximum allowed iterations reached - go try next loop value } if ((i <99)&& v8 == v12 && i == v11 &&(0==invalid_chain)) { puts("You have broken the chain!"); printf(" %d \n", counter); return 1; } invalid_chain = 0; } return 0;}

On a LG Gram 17 laptop, this program can try the whole 32-bit keyspace in about 15 minutes. Modern computers are awesome!

Seed ffffd32a requires 72 links
You have broken the chain!
 -11478

But this article is supposed to be about angr hooks, so lets continue there. A first crack at this should avoid code sections that 'puts' error messages and look for code paths that lead to puts("You have broken the chain!").

import time
import claripy
import angr

# specify good and bad endpoints
bad1 = (0x40149D) # ERROR\nStarting seed doesnt match final
bad2 = (0x4014AB) # usage: ./chainbreaker SEED
bad3 = (0x401461) # ERROR\nMaximum allowed iterations reached
bad4 = (0x40122C) # ERROR\nInvalid chain produced!
good = (0x401487) # You have broken the chain!

project = angr.Project("./chainbreaker", auto_load_libs=True)
#create an initial state with a symbolic bit vector as argv1
sym_arg_size = 1 #Length in Bytes because we will multiply with 8 later
sym_arg = claripy.BVS('sym_arg', 8*sym_arg_size)
argv = [project.filename]
argv.append(sym_arg)

initial_state = project.factory.full_init_state(args=argv)
for byte in sym_arg.chop(8):
    initial_state.add_constraints(byte >= '\x30') # '0'
    initial_state.add_constraints(byte <= '\x39') # '9'
sm = project.factory.simulation_manager(initial_state)

t0 = time.time()
sm.explore(find=good,avoid=[bad1,bad2,bad3,bad4])
t1 = time.time()
print (t1-t0)
found = sm.found[0]
result = found.solver.eval(sym_arg, cast_to=bytes)
print(result)

This code limits the input seed to a single digit, and yet angr has not finished hours later. And we would only get an error message at that point any way because the correct input has 6 digits and starts with a '-'. Since we already have the key, we can investigate what goes wrong. With the input modified like this:

data =    '-1147'                            # this is the start of the correct password
stringAsList = list(data);                   # turn string into list
stringAsList.append(sym_arg)                 # declare how long the 
bytestring = claripy.Concat(*stringAsList)   # turn list into bytestring
argv.append(bytestring)
print(argv)

If we ignore the fact that the answer is a negative number, we can see that the original angr input file does, in fact, find the right solution eventually - it's just very slow.

(angr) $ python angrfile 

[...]

<BV48 0x2d31313437 .. sym_arg_47_8>
['./chainbreaker', <BV48 0x2d31313437 .. sym_arg_47_8>]

[...]

813.7370481491089
b'8'

About 800 seconds for a single symbol; about 1600 for two, but after many hours 3 had not yet finished.

Next, lets patch out the delay so that angr does not symbolize it. From Ghidra, we get the addresses to hook (after rebasing the position-independent executable to 0x400000)

00401441 e8  e8  02       CALL       std::chrono::duration<long,std--ratio<1l,1000l   undefined duration<int,void>(dur
         00  00
00401446 48  8d  45       LEA        RAX => local_38 ,[RBP  + -0x30 ]
         d0
0040144a 48  89  c7       MOV        param_1 ,RAX
0040144d e8  cb  04       CALL       std::this_thread::sleep_for<long,std--ratio<1l   void sleep_for<long,std--ratio<1
         00  00
00401452 83  45  e4       ADD        dword ptr [RBP  + local_24 ],0x1
         01

And it's pretty easy to hook these calls:

def durationhook(state):
        print("0x401441 hooked")
project.hook(0x401441, durationhook, length=5) # hook this address, execute durationhook() and then skip 5 bytes, skipping the 'call' at this address
def sleep_forhook(state):
        print("0x40144d hooked")
project.hook(0x40144d, sleep_forhook, length=5) # hook this address, execute sleep_forhook() and then skip 5 bytes, skipping the 'call' at this address

An interesting fact in this case is that while the angr documentation states:

"The CFG analysis does not distinguish between code from different binary objects. This means that by default, it will try to analyze control flow through loaded shared libraries. This is almost never intended behavior, since this will extend the analysis time to several days, probably. To load a binary without shared libraries, add the following keyword argument to the Project constructor: load_options={'auto_load_libs': False} "

https://docs.angr.io/built-in-analyses/cfg#shared-libraries

the reality is that auto_load_libs has the opposite effect here.

The two hooks yield a computation time (for 1-symbolic-digit integers) of about 30 minutes. Can anything else be done for the current case? There are some messages during the angr execution that point to some potential targets:

WARNING | 2020-01-12 16:13:03,410 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.WARNING | 2020-01-12 16:13:03,410 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:WARNING | 2020-01-12 16:13:03,410 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial stateWARNING | 2020-01-12 16:13:03,410 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold nullWARNING | 2020-01-12 16:13:03,410 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.WARNING | 2020-01-12 16:13:03,411 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0000 with 207 unconstrained bytes referenced from 0x409d2a0 (strlen+0x0 in libc.so.6 (0x9d2a0))WARNING | 2020-01-12 16:13:03,566 | angr.state_plugins.symbolic_memory | Filling memory at 0x7fffffffffeff00 with 6 unconstrained bytes referenced from 0x40b9ff0 (memcpy+0x0 in libc.so.6 (0xb9ff0))WARNING | 2020-01-12 16:13:03,690 | angr.state_plugins.symbolic_memory | Filling memory at 0x7fffffffffefe68 with 8 unconstrained bytes referenced from 0x4049340 (strtoq+0x0 in libc.so.6 (0x49340))

so angr does not know that the sym_arg_size = 1 constrains these values. I have seen these warnings many times for programs that use string functions, so it's valuable to see how much of a problem this really is.

printf and puts seem obvious candidates to hook, as we don't need their output in this analysis:

def puts_hook(state): print("puts hooked")project.hook_symbol('puts', puts_hook)
def printfhook(state): pass #print("printf hooked")project.hook(0x4013b1, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this addressproject.hook(0x4013ec, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this addressproject.hook(0x40142e, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this address

Which results in a 25% run time reduction, but the messages don't all go away.

WARNING | 2020-01-12 20:03:19,308 | angr.project | Address is already hooked, during hook(0x4083cc0, <function puts_hook at 0x7f535df95620>). Re-hooking.puts hookedWARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.WARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:WARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | 1) setting a value to the initial stateWARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold nullWARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY_REGISTERS}, to suppress these messages.WARNING | 2020-01-12 20:03:21,660 | angr.state_plugins.symbolic_memory | Filling memory at 0x7ffffffffff0000 with 207 unconstrained bytes referenced from 0x409d2a0 (strlen+0x0 in libc.so.6 (0x9d2a0))WARNING | 2020-01-12 20:03:21,816 | angr.state_plugins.symbolic_memory | Filling memory at 0x7fffffffffeff00 with 6 unconstrained bytes referenced from 0x40b9ff0 (memcpy+0x0 in libc.so.6 (0xb9ff0))

And it looks like the puts function was already hooked by the library simulation code. Re-hooking removes the call to stroq. The other calls are due to stoi and its' support calls:

std::allocator<char>::allocator(&v6);
std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::basic_string(&v5, argv[1], &v6);
inputasint = std::__cxx11::stoi((__int64)&v5, 0LL, 0xAu);
std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>::~basic_string(&v5);
std::allocator<char>::~allocator(&v6);

The hooking of stoi requires quite a bit of changes, so here the complete script:

import timeimport claripyimport angr
# specify good and bad endpointsbad1 = (0x40149D) # ERROR\nStarting seed doesnt match finalbad2 = (0x4014AB) # usage: ./chainbreaker SEEDbad3 = (0x401461) # ERROR\nMaximum allowed iterations reachedbad4 = (0x40122C) # ERROR\nInvalid chain produced!good = (0x401487) # You have broken the chain!
project = angr.Project("./chainbreaker", auto_load_libs=True)
argv = [project.filename]data = '-1147' # dummy data, since we are hooking stoiargv.append(data)print(argv)initial_state = project.factory.full_init_state(args=argv)sm = project.factory.simulation_manager(initial_state)
aux_arg = claripy.BVS('sym_arg', 8*4) # declase a 32 bit symbolic variableinitial_state.add_constraints(aux_arg <= -11000) # constrain variable to range we know the answer to be ininitial_state.add_constraints(aux_arg >= -12000) # constrain variable to range we know the answer to be in
def durationhook(state): print("0x401441 hooked")project.hook(0x401441, durationhook, length=5) # hook this address, execute durationhook() and then skip 5 bytes, skipping the 'call' at this addressdef sleep_forhook(state): print("0x40144d hooked")project.hook(0x40144d, sleep_forhook, length=5) # hook this address, execute sleep_forhook() and then skip 5 bytes, skipping the 'call' at this addressdef allocatorhook(state): print("allocator hooked")project.hook(0x4012e5, allocatorhook, length=5) # hook this address, execute allocatorhook() and then skip 5 bytes, skipping the 'call' at this addressdef allocatordestructorhook(state): print("allocatordestructor hooked")project.hook(0x401334, allocatordestructorhook, length=5) # hook this address, execute allocatordestructorhook() and then skip 5 bytes, skipping the 'call' at this addressdef basicstringhook(state): print("basicstring hooked")project.hook(0x401303, basicstringhook, length=5) # hook this address, execute basicstringhook() and then skip 5 bytes, skipping the 'call' at this addressdef basicstringdestructorhook(state): print("basicstringdestructor hooked")project.hook(0x401328, basicstringdestructorhook, length=5) # hook this address, execute basicstringdestructorhook() and then skip 5 bytes, skipping the 'call' at this addressdef stoihook(state): state.regs.rax = aux_arg print("stoi hooked") print(state.regs.rax)project.hook(0x401319, stoihook, length=5) # hook this address, execute stoihook() and then skip 5 bytes, skipping the 'call' at this addressdef puts_hook(state): print("puts hooked")project.hook_symbol('puts', puts_hook)def printfhook(state): pass #print("printf hooked")project.hook(0x4013b1, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this addressproject.hook(0x4013ec, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this addressproject.hook(0x40142e, printfhook, length=5) # hook this address, execute printfhook() and then skip 5 bytes, skipping the 'call' at this address
t0 = time.time()sm.explore(find=good,avoid=[bad1,bad2,bad3,bad4])t1 = time.time()print (t1-t0)found = sm.found[0]print(aux_arg)result = found.solver.eval(aux_arg)print(result)result = result - 4294967296print(result)

And the warnings about the unconstrained bytes disappeared:

WARNING | 2020-01-13 01:26:35,067 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.['./chainbreaker', '-1147']WARNING | 2020-01-13 01:26:36,018 | angr.project | Address is already hooked, during hook(0x4083cc0, <function puts_hook at 0x7f2cf0c88ae8>). Re-hooking.puts hookedallocator hookedbasicstring hookedstoi hooked<SAO <BV64 0x0 .. sym_arg_47_32>>basicstringdestructor hookedallocatordestructor hooked

Now an interesting thing happens, for both

initial_state.add_constraints(aux_arg <= -11470) # constrain variable to range we know the answer to be in
initial_state.add_constraints(aux_arg >= -11480) # constrain variable to range we know the answer to be in

and

initial_state.add_constraints(aux_arg <= -11400) # constrain variable to range we know the answer to be in
initial_state.add_constraints(aux_arg >= -11500) # constrain variable to range we know the answer to be in

the execution time is about 900 seconds, but for

initial_state.add_constraints(aux_arg <= -11000) # constrain variable to range we know the answer to be in
initial_state.add_constraints(aux_arg >= -12000) # constrain variable to range we know the answer to be in

After setting all these hooks, all that is left of the program is symbolic execution of the user space, but in this case angr is incapable of solving it. The hooking however does show its value. Instead of not being able to solve for even the last digit, angr can now solve 2. We also learned that the common warnings about the printf symbolization do not necessarily have to be too much of a concern.