10000

A Flare-on 12 Level 9 Challenge

Descript

Ok hotshot, this is it, the only thing standing between you and slightly less internet anonymity.


7-zip password: flare

Analysis

Finding the main function

We are given a Windows executable that is over 1GB. Running the executable, we see an error message "invalid license file". We can open the executable in IDA and search for this string.

Error message

Looking at the string references, we can see that sub_140001E87 uses it. We can see that another function sub_140001180 calls this function, but it just seems to be running some initialization stuff so we don't have to bother reversing it. We can rename sub_140001E87 to mainFunction.

Main function analysis

Since the executable is stripped, most of the function names are unknown. We can guess some of the through analysis, like how sub_1400C7680 is like printf because of the arguments passed into it ("checking license file..." as 2nd argument).

You can load FLIRT signatures to resolve most of the functions. I managed to resolve some but not all, so this writeup will demonstrate the resolution of functions without FLIRT signatures.

We also see "license.bin" being passed as the 2nd argument into function sub_14009E480, from which we can assume its fopen. From this method, we can resolve functions like fread and fseek. fread reads the file contents into a variable we will rename to filecontent.

Looking at the function sub_14001F990, we see that it eventually calls the sub_14001E750, which uses a string "picosha2.h" as argument. Searching for this online, we can see that it is used in a library that generates a SHA256 hash (https://github.com/okdshin/PicoSHA2). Looking at the examples in the GitHub page, we can infer that sub_14001F990 is essentially picosha2::hash256 as the arguments also makes sense. If we debug the executable, we can also see that the function does indeed write the SHA256 hash of the contents of "license.bin" into the 3rd argument of the hash256 function.

If we rename the functions and variables, this is what we have now:

IDA File operations

Finding the goal

A bunch of code is then executed, which we will check out later. We first see how this generate hash will be used. In the mainFunction, we see that sub_14001D6A0 is called with the hash as the 3rd argument. The 1st argument passed is a global variable that points to a bunch of random bytes, which is a size of 80 bytes. This can be guessed to be our ciphertext that needs to be decrypted.

In the function we also see that it eventually calls sub_14001D8F0, which calls another function with the "Invalid key size" as an argument. These hints of it requiring a fixed key size and the random bytes is 80 bytes long which can be placed in blocks of 16 bytes highly points to this function being a AES decryption.

Looking at the arguments to this decryption function, the 2nd argument seems to signify the size of ciphertext, 4th argument is the size of the hash (AES key), and the 5th argument to a global variable containing 16 bytes, which highly suggest that it is the IV key. This gives us an idea that it is most likely performing AES CBC decryption.

GPT with IDA MCP can be used to confirm that the function is indeed performing AES CBC decryption.

From this, we can derive that we must recover the expected "license.bin" in order to decrypt the ciphertext for the flag.

Reversing license checks

There are several checks in play. The first is for the size of "license.bin" to be 340000 bytes.

The variable we renamed to filecontent is an array of 2 bytes. This is then assigned to another variable, then these 2 bytes are checked to make sure its not more than 9999. These are performed in the for loop, which seems to be the most important part of the license checking.

We can see a clear pattern here. A for loop running for 10000 iterations where each recordId (2 bytes) is checked to be less than 9999 signifies an indexing of some sort. file_content_ptr is then incremented (by 2 bytes) and at the end of the loop, it is incremented by 16 (32 bytes) which is off by 34 bytes effectively.

Since this iterates over 10000 times, it means there should be a total of 10000βˆ—(32+2)=34000010000*(32+2) = 340000 bytes, which matches the previous check of 340000 bytes for "license.bin".

This gives us the following struct for the license:

Main Logic

The first few function calls in the for loop isn't note worthy. The interesting part is in the function sub_140001482, where it takes in the id (recordId variable) as an argument and calls a bunch of WinAPI functions.

We see that the recordId is passed into FindResourceA as the 2nd argument, which is used to specify the ID of the resource to load. If we view the resources of the executable in CFF Explorer, we can see why the file is over 1GB of size; it contains 10000 resources with each resource being over 0x10000 bytes big!

Resources of executable

This means the specified recordId would specify the resource accordingly, 0 would match the first resource, and 9999 would match the last.

The resource is then loaded and a buffer is assigned its pointer. This is then passed into another important function sub_1400035E8, which essentially decodes the resource bytes. We can determine this by debugging, where we see the 1st argument is the input encoded bytes and the 2nd argument is the output containing the decoded bytes.

We can also tell that the resource is essentially encoded DLLs based on the first few bytes seen in the decoded output.

Extracting DLLs

With this, I asked GPT to code me a python script that uses the frida library to hook the running executable at a given address and dump out a certain number of bytes in the specified register.

The code had a lot of errors initially (mostly with running the fridaJS), but I managed to fix it by reading the documentation on the correct frida version that I'm using. This particularly fixes the retrieval of base address of the executable, the code to set the interceptor at an address, and also the reading of bytes from the address stored in the RAX register.

Running this took about 2 hours to extract all the DLLs. It would be faster if the decryption function was reimplemented or if I had threaded the extraction process, but I had time so I just let it run in the background.

Looking back at the function that loads the resource, we can see that after the decoding, there are some offset calculations before copying the decoded bytes into the VirtualAlloc-ed region.

For people who are familiar with reversing Windows Executables, the offsets should look familiar. I recommend using 010 Editor to view the DLL for easier visualization as it highlights the different fields and their sizes in a hex view.

Offset
Significance

decoded + 0x3C

AddressOfNewExeHeader

*(int *)(decoded + 0x3C) + decoded

NtHeader

NtHeader + 0x50

SizeOfImage

NtHeader + 0x14

SizeOfOptionalHeader

NtHeader + 0x18

OptionalHeader

NtHeader + 0x18 + *(unsigned __int16 *)(NtHeader + 0x14)

SectionHeaders

NtHeader + 6

NumberOfSections

sectionHeaders[3]

VirtualAddress

sectionHeaders[4]

SizeOfRawData

sectionHeaders[5]

PointerToRawData

decoded + sectionHeaders[5]

.text section (Start of code)

This chunk of code essentially copies the .text section of the DLL into the (VirtualAlloc-ed region + VirtualAddress offset).

Following this code is another for loop:

Although this look complex, what it does is pretty simple:

1

Go through all the imports of the DLL

2

Check the first 4 bytes of the module name

3

If first 4 bytes of module name isn't all digits, then its a normal system provided library, so load and populate IAT.

4

Else the library is custom and should be loaded as a DLL, so recursively call the current function to load other DLLs, then populate IAT.

Basically, this would go through all imports of the DLL and then recursively load it.

The rest of the code in the function isn't that important, it mainly loads things into their correct locations so that the DLL is able to run normally with the exported functions. The only important function that was called at the end is sub_1400AE7E0, which we will talk about later on.

Going back to the mainFunction, after the loading of the libraries, the exported function _Z5checkPh of the DLL is then loaded and ran (at 0x140001180) with the file_content_ptr as an argument. This can easily be deduced by debugging the executable. We can also guess that each DLL would have their own checks on each 32 bytes of body, which we would need to reverse the logic of. Repeat this 10000 times to get all the correct bytes of body for each DLL.

Getting the right order

Before diving into the DLLs, we have to determine the order of which to load the DLLs in, that is, what is the order of recordIds to input.

In the mainFunction, there is a memcmp after the for loop ends. It compares 40000 bytes of a global variable that is initially all nulls, against a populated global variable, which is the goal we have to aim towards. We can start by looking at the references to this global variable of nulls.

Out of all the references, the only one that seems to change the values is found in the function sub_140000D9F, which is called by the mainFunction.

We can see that sub_1400220F0 returns the index to which a certain value should be added to. If you enter the sub_140000D9F function in IDA and press F5 to refresh the generated decompiled code, you will see that our global variable of nulls changed from unk... to dword....

This is because IDA identifies that 4 bytes are being added to it for each index. So when the memcmp compares 40000 bytes, it's actually comparing 10000 entries of 4 bytes. This matches the number of DLL ids we can input. We can convert both global variables to an array with the correct sizes.

We can also easily see that the value that it adds is the current iteration count of the for loop in the mainFunction. However to determine how the index (v4) is generated, we have to look through some confusing references...

Most of the functions in the sub_140001D9F function are pretty straightforward, but they suggest that the variables being worked on is of some kind of struct, especially so for unk_140112C80.

Renaming the variables and functions, we have this:

It's hard to understand completely what's going on here because the struct makes it messy, but it's actually pretty simple.

Looking at the index variable and its type definition "*(unsigned __int16 **)", we can easily deduce how the struct would look like:

startPtr and endPtr would be pointers to pointers to int16 values. compareDeref would compare the 2 addresses to see if the queue ended, if the addresses are not the same, then it continues the loop. At the end of the loop, the goNext function would add 8 to the pointer, which makes it point to the next pointer in the queue. 8 is then added until the startPtr is the same address as the endPtr.

The index is the value dereferenced from the pointer that startPtr is pointing to.

Loop example

Now we know it retrieves the index from the queue, we need to find out how this queue is populated.

Looking at references to the queue, we see the function that loads the libraries accesses it many times. Now it's quite confusing to reverse this part on how the queue is populated. The easiest way is to debug and check the queue after the load library function is ran.

When we debug and check against the DLLs that ran, we see a certain pattern. When we load a random DLL, we see that the queue will be populated with a lot of entries. However, if we try to load 9998.dll by putting the id as 9998 in little endian, we see that the load library function runs very quickly because the DLL does not have any imports. We can also see that the queue contains only 1 entry, which is a pointer to the current DLL index 9998! From this, we can guess that the load library function populates the queue based on each DDL ID, including the ID of the other DLLs it imports since load library would run recursively.

Now we know that depending on the DLL ID provided, it will populate the queue with indices that are the IDs of every imported DLL recursively, and then each ID is used as an offset to the targetInp from which to add the order to.

In pseudo-python code, the main code of the executable is doing:

Given that we know that at each loop, it adds all indices by the loop counter, how do we get the right order so that the target matches the goal?

We first need to extract all imports for every single DLL. This gives us an idea of what indices of targetInp are incremented for each DLL ID we give. I used GPT to generate a python script for extracting the imports recursively for each every DLL ID. This script outputs a sets.json containing a json file entries {ID: list_of_imports}.

After the import lists are extracted, we can ask GPT to create another script for solving the order given the import lists and the list of goals.

The way that it solves for the correct order is through finding each import references and eliminating them one by one. Say that we have:

{0: [0,1], 1: [1,2], 2: [2,3], 3: [3]}

This means that ID 0 is only imported by 0, ID 1 is imported by both 0 and 1, ID 2 is imported by both 1 and 2, and ID 3 is imported by 2 and 3.

Since ID 0 is only imported by 0, it can only be at goal[0]. This eliminates 0 from the imports of others, meaning the now ID 1 is only import by 1, which means it can only be at goal[1], and so on until eventually we have goal = [0,1,2,3]

Now when we run solve_order.py, we get a order.txt which contains the order of IDs to input separated by newlines.

10000 DLLs War

Some Initial Setup

Now that we solved the order of IDs, we can solve for the 32 bytes of body for the respective IDs.

Every DLL is unique but very similar. If we open two different DLLs in IDA and focus on the check function ("_Z5checkPh" calls the "check" function), we can see the differences and similarities.

Before we start, I have to point out that my IDA originally had issues identifying the calling convention of the DLLs. It assumed that the calling convention was RDI, RSI, RDX... (Linux System V AMD64) instead of the intended RCX, RDX, R8 (Windows x64 __fastcall).

Wrong calling convention!

I had to go to my IDA's Options > Compiler > Compiler and change it from "GNU C++" to "Visual C++" in order for IDA to interpret the correct calling convention.

Change compiler

With that out of the way, we can start analyzing the functions.

Stage 1: Flood of similar function calls

I will start by analyzing 7476.dll, the first ID of the order. The check function starts off with numerous function calls to locally-defined and imported functions. Every one of these functions takes in the 32 bytes body input for modification. It should also be noted that every DLL also exports some of these functions.

Since we want to retrieve the correct input, we have to create an inverse/reverse of every "encrypting" logic in this check function. At the end it should have a check against a few hard coded bytes to check if the encrypted input matches it. This hard coded bytes would be the goal and also the input to our inverse/reverse functions, in order to get the original input.

If we enter a random function, we can see that there are a lot of hardcoded values, and they are used to modify the input payload.

This can easily be reversed given the key 32*8 bytes key. We also see an XOR performed on line 7. If we look at other f... functions, we see this XOR is done at every start of the function. Now, this caused me problems in the implementation of the inverse of this function, but this value that it tries to XOR is actually pointing to the totalInp global variable from the main executable, at its own index's offset. So for 7476.dll, it will get the value from totalInp[7674] and XOR with the input, for every one of these functions in 7476.dll. I only figured this out after debugging and tracing the addresses.

Now, this is just one function to inverse. Looking at most of the functions, we can see that most of it perform the similar algorithm, just with a different key. There are only 3 unique functions that we have to create inverses of, then we also need to code out a way to extract the keys from every one of these functions.

Here are examples of each unique function:

Now we can just create a function to extract the keys from these functions, and also the XOR key from the totalInp global variable (4 bytes), and create inverses of each function and we are done with this stage.

Stage 2: key XORs

After the wave of function calls, the code does some XOR with some hardcoded bytes.

Although it defines the buffer with weird offsets of v97, v8 effectively just points to the keys starting from v99, which is a 16 * 8 byte key. We can debug and see in order to determine what is being XORed.

rax = modifiedInp, rcx = v99[0] key

This is also simple to create an inverse function for since it's just a simple XOR, we just need to extract the keys.

Stage 3: Useless checks

Following that, there is this huge chunk of code that calls this sub_3422314B0 function numerous times.

This whole chunk can be ignored in our solver. This is because if you look at the end of this code chunk, all it does is just checks for if v4 == 0 then fail the check (return 0). If you look at this code chunk, nothing also modifies any buffer that would be used in the following code. This seems to be just a check to make sure that the encrypted input matches some criteria, so there is no point creating a reverse for this.

Stage 4: Some obscure algorithm

Following the previous code, there is some kind of moving 1s and setting of nulls to an array at the start of this check.

If we visualize this in an array where each number is 8 bytes, it would look like this:

However, if we debug this program within the next for loop, we can see that this small loop just copies the encrypted bytes into another address. We can also see that only the bytes at even offsets are being used. The reason for this is because each value is treated as a 16 byte value instead of the usual maximum 8 bytes. This can be determined later on when we discuss the reversing of the sub_3422314B0 function.

Moved encrypted bytes

We also see that its a length of 16 entries. So if we make the 0s and 1s array into 16 bytes each, we get an Identity Matrix!

The buffer that points to this matrix is then used in the following for loops.

To get a rough idea of what it is doing, we first have to look back at the 2 buffers that were defined in Stage 3 but not used then.

v102 is only used at if ( (v102 >> k) & 1 ), which is difficult to determine the purpose of currently.

v103 has been used a lot in Stage 4, and its always with the use of the sub_3422314B0 function. This means we have to figure out what this function does and the easiest way to do this is through debugging, seeing the input values, and then analyze it against the output value. Through this method, we can determine that this function effectively performs a 16 bytes (128-bit) modulus operation, with the given modulus as a fixed value from the buffer v102 that was hardcoded in Stage 3.

The rest of the logic was hard for me to reverse, so I had to use GPT-5 to help me reverse the this stage. Turns out, the whole stage basically performs a Modular Matrix Exponentiation. This can be reversed with some math and GPT-5 easily generated an Inverse Modular Matrix Exponentiation function for me.

From this, we can see that the 18 * 8 hard coded bytes are the Modulus, followed by the Exponent, then 16 * 8 bytes of the xor key for Stage 2.

Stage 5: Final check

The last check is just a memcmp against some hard coded 16*8 bytes.

Weirdly, IDA's decompiler couldn't show the buffers being checked properly. I think this had to do with how the buffer is confused with some buffers being 16 bytes, and some being 8 bytes. I had to convert the buffer into an array of 32 QWORDs in order for the decompiler to show the right code.

In the end, I created several arrays of 32 QWORDs, all of which I will reference as an offset from the Goal buffer (because it was used in the modular matrix exponentiation algorithm):

Note that I treated Goal as a pointer to 16 bytes although there isn't a thing. This makes the matrix algorithm more understandable because of how IDA interprets it weirdly.

The information stated below is right before the matrix algorithm is ran, after the first small for loop that copies bytes.

  • Goal[0 to 15] contains the actual hardcoded goal bytes.

  • Goal[16 to 16+15] contains an empty array of 32 * 16 bytes

  • Goal[32 to 32+15] contains an empty buffer to which Goal[64] is copied to

  • Goal[48 to 48+15] is essentially the Size, Goal[48] points to the start of Size (originally where the Identity Matrix is set)

  • Goal[64 to 64+15] is the original encrypted input, right before the matrix algorithm is performed

Goal[32 to 79]

With the renaming of variables and the creation of the arrays, the matrix algorithm is easier to read.

It might be possible to reverse this and identify the algorithm without the use of GPT as well.

Retrieving Flag

With all the inverse functions in place (and generated with GPT), we can solve for all the 32 bytes for each ID. One last problem with have that we need to prepare for is the resolving of imported functions for every DLL. We need to get the imported functions with their algorithm and keys ready before running through the check function of each DLL.

I asked GPT to create a python script for me that extract the EAT for every DLL, where the output is a tables.json file that contains entries like

Now we have everything to get the correct 32 bytes for each DLL. I used capstone to extract the keys and determine the algorithms for each function. The main load_dll function is manually coded but the other inverse algorithms are created by GPT.

The solve follows the idea:

1

Set the keys of targetInp to be populated as a Global Variable, initialized with [0] * 10000. Read in the import calls of each DLL from sets.json and the EAT from tables.json.

2

Iterate through the correct order of DDLs using orders.txt, feeding the specified DLL together with the DLL's xorkey.

3

Read the IAT of the current DLL, then iterate through all the instructions in the check function of the DLL.

If a call is detected, it means Stage 1 is active.

If the call is on the RAX register, means that it is calling an imported function. Resolve this address from the IAT to get the dll and function name, then use these to resolve the function operation and its key from the EAT. Append the operation function, the key, and the xorkey to a call_list.

4

If not, then the call is to a locally-defined function. So jump to that function and iterate through the assembly until it returns. Extract keys when movabs opcode is used. When the function returns, check the function size. Different function operations (out of the 3) have different size ranges it can appear in. Append the operation function, the key, and the xorkey to the same call_list.

5

Back to the check function. If the opcode is a movabs, this means that Stage 1 is done and Stage 2 starts. This is the 18 * 8 bytes hard coded values that needs to be copied over. The first 2 * 8 bytes will always be the Modulus followed by the Exponent. The following 16 * 8 bytes will be the xor key for Stage 2. All these will be appended to a key_list list.

6

If the key_list is populated with 18 entries, then keep reading the disassembly until it hits another mobavs. This will be the goal bytes that we need for input. After we receive 16 * 8 of these hard coded Goal bytes, we can break out of the loop.

7

Using the Modulus, Exponent, and the Goals, pass it into reverse_matrix_exponentiation.

8

Use the results from reverse_matrix_exponentiation and xor with the 16 byte xor keys in Stage 2.

9

Iterate through every of the call_list in reverse, and call it with the inverse operation with the key, then xor with the xorkey. This should give us the original 32 bytes.

Write the "<ID>: <original_input_hex>" to a file inputs for saving.

Once that is done (takes around >2hrs for me), we can reconstruct the license.bin using the inputs from inputs, and then SHA256 the file and use that as the AES CBC Key to decrypt the ciphertext!

These are all the scripts that I have used, in execution order:

This challenge was really time consuming, and I can assume it is meant to be based on the 10000 DLLs that were given. Either way, I learned new things like how to spot the matrix algorithm and how to spot queues and its generation in IDA.

It was a painful yet satisfactory journey πŸ”₯πŸ”₯πŸ”₯

Last updated

Was this helpful?