Decrypting Poison Ivy’s Communication Using Code Injection and DLL Proxies

Not so long ago a friend of mine came to me telling me that he thinks his web server is acting as a command and control (C&C) server for some hacker. I took a look, and indeed found the Poison Ivy (PI) C&C app (or as its author calls it, “remote administration tool”) installed on the machine. After realizing that the network administrator has some pcap files of network traffic, my friend asked me to see if I can find out what the hacker did with the app on his machine. Unfortunately, PI’s traffic is encrypted, so I asked my friend for the C&C app to see if I can find out how to decrypt the data.

According to PI’s author, all communication is encrypted using Camellia with a 256-bit key. Camellia is a symmetric-key block cipher, that operates on 128-bit blocks. I, for one, don’t blindly trust the word of a guy who distributes such a tool for free (no offense), so I first wanted to check that this is indeed the case. A hex search for the S-Boxes’ values finds them in the executable (which is packed, BTW), but let’s dig a little bit deeper. It appears that when running PI’s C&C, it first writes a DLL named PILib.dll to its current working directory, and then loads it. The functions in the DLL can also be found in the clients’/Trojan-horses’ (called “servers” in PI, for some reason) code. This DLL may be our guy. It exports 3 functions, C_SK, C_E, and C_D. Could they be Camellia_ScheduleKeys, Camellia_Encrypt, and Camellia_Decrypt? Let’s find out.

Here’s a part of C_SK. If you compare it with the key scheduling part in RFC 3713 (Camellia’s specification), you can see that this is indeed Camellia’s key scheduling algorithm, using a 256-bit key:

So we know that the C&C server uses PILib.dll to handle all its encryption/decryption tasks. The client has the same functions embedded in its code (i.e., used internally, and not imported). We now have the following options:

  • Unpack the C&C server, find out how it uses the DLL (e.g., how the shared key is passed to the key scheduling algorithm), and use the DLL to decrypt known PI traffic, and then deduce what the old captured data means. This entails dealing with anti-debugging and anti-reversing techniques (if manually unpacking), and then digging some more. Afterwards, we’ll have to work some more, using Wireshark, and passing the encrypted data through the DLL.
  • Perform a similar procedure, but for a PI client (Trojan horse). This might prove to be a little more rewarding (but just a little bit).
  • Replace PILib.dll with our own DLL, which will reside between the C&C server and the original DLL. Calls to C_SK, C_E, and C_D will go to our DLL, and get both logged and redirected to the original DLL. Our log will immediately contain the secret key, the encrypted data, and the decrypted data.

Of these options, I chose the third one, as it’s more elegant than the other two (it’s a matter of opinion, I guess). Little did I know what I’ll have to do to get there. At first I thought that if the C&C server sees that PILib.dll already exists, it won’t overwrite it, so I can just name my proxy DLL PILib.dll and put it there. That, of course, was too good to be true. My DLL got overwritten, and the real PILib.dll was used. So I decided to patch LoadLibrary, so when a request to load PILib.dll arrives, it will instead load my DLL. The problem was that it’s really a pain to statically patch kernel32.dll, as it’s one of Windows’ known DLLs (and so, it’s not affected by the DLL search order), and it’s also protected by the OS (meaning that if you patch it in its directory, your patched version is going to revert to the original version in a matter of seconds).

So we’re left with dynamically patching kernel32.dll, after the PI C&C server is loaded into memory. Thing is, everything happens so quickly, so we need to be in control of the loading process. We can write a program that calls CreateProcess and ask its main thread to start in a suspended state. This is not enough, though, as at that point only ntdll.dll is loaded by the created process. We need to let the PI C&C run, but we don’t really want it to do anything, so we patch the code at its entry point and make it loop endlessly, doing nothing. Once we’re sure kernel32.dll got loaded by the loader for PI’s process, we can suspend it again, patch kernel32.dll, restore the original code at its entry point, and resume the process.

Here’s the piece of the code that does just that:

Now for the patches. The patch for the endless loop is simply “jmp -2″. Our patch for LoadLibrary basically looks like this:

LoadLibraryA:
	jmp start_of_code
	...

start_of_code:
	push [esp+4]
	push address "PILib.dll" (data+1)
	call _strcmpi
	push edi
	mov edi,address "aPILib.dll" (data)
	test eax,eax
	jnz back
	mov [esp+10h],edi

back:
	pop edi
	pop eax
	pop eax
	pop ebp
	mov ebp,esp
	jmp LoadLibraryA+5

data:
	"aPILib.dll"

The code is inefficient, for reasons you will soon understand. Some things to pay attention to:

  • _strcmpi does not clean the stack, so we need to do it ourselves. This is why we use “esp+10h”, and we also have two “pop eax” commands there.
  • We look for the string “PILib.dll” (case-insensitive). if we find that this is the requested DLL, we trick LoadLibrary into thinking it should load “aPILib.dll”. That would be the name of our DLL proxy.

Now, the easy way to go about applying the patch, is to put most of the code in a new memory region (using VirtualAllocEx and WriteProcessMemory), and jump to it from LoadLibrary. However, we’re not here for the easy way. :)

I decided to patch kernel32.dll directly, using whatever small NOP space I could find. The result was a series of patches that form a chain, where each link in the chain (short) jumps to the next link, and the last link jumps back to LoadLibrary. The result can be found below (it prefixes the C code given above):

We’re left with the task of writing the DLL proxy. Two things are worth mentioning in that context:

  • The original C_SK, C_E, and C_D perform their own stack cleanup, so they should be called with the __stdcall calling convention.
  • As per the previous point, our replacement C_SK, C_E, and C_D should also be declared as __stdcall. However, this means that the Visual Studio compiler will generate decorated names for our exported functions, based on the amount of memory they pop off the stack upon returning (i.e., C_SK@8, C_E@12, and C_D@12). In turn, this means that the PI server won’t find the functions that it looks for, and thus won’t be able to use them. We therefore need to create a DEF file (not supplied here), to make sure we don’t have decorated names. The whole ‘extern “C” __declspec(dllexport)’ is thus probably unnecessary, as it doesn’t lead to the intended result (in this case).

Here’s the code for a simple pass-through DLL proxy:

Of course, you need to add your logging mechanism to the mix. The first parameter to C_SK gives you the symmetric key being used. As for the rest of the functions, I leave them for you to explore.

Update: An initial analysis of Poison Ivy is also available.

 

Be Sociable, Share!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>