Not so long ago a friend of mine came to me telling me that he thinks his web server is acting as a command and control (C&C) server for some hacker. I took a look, and indeed found the Poison Ivy (PI) C&C app (or as its author calls it, “remote administration tool”) installed on the machine. After realizing that the network administrator has some pcap files of network traffic, my friend asked me to see if I can find out what the hacker did with the app on his machine. Unfortunately, PI’s traffic is encrypted, so I asked my friend for the C&C app to see if I can find out how to decrypt the data.
According to PI’s author, all communication is encrypted using Camellia with a 256-bit key. Camellia is a symmetric-key block cipher, that operates on 128-bit blocks. I, for one, don’t blindly trust the word of a guy who distributes such a tool for free (no offense), so I first wanted to check that this is indeed the case. A hex search for the S-Boxes’ values finds them in the executable (which is packed, BTW), but let’s dig a little bit deeper. It appears that when running PI’s C&C, it first writes a DLL named PILib.dll to its current working directory, and then loads it. The functions in the DLL can also be found in the clients’/Trojan-horses’ (called “servers” in PI, for some reason) code. This DLL may be our guy. It exports 3 functions, C_SK, C_E, and C_D. Could they be Camellia_ScheduleKeys, Camellia_Encrypt, and Camellia_Decrypt? Let’s find out.
Here’s a part of C_SK. If you compare it with the key scheduling part in RFC 3713 (Camellia’s specification), you can see that this is indeed Camellia’s key scheduling algorithm, using a 256-bit key:
So we know that the C&C server uses PILib.dll to handle all its encryption/decryption tasks. The client has the same functions embedded in its code (i.e., used internally, and not imported). We now have the following options:
- Unpack the C&C server, find out how it uses the DLL (e.g., how the shared key is passed to the key scheduling algorithm), and use the DLL to decrypt known PI traffic, and then deduce what the old captured data means. This entails dealing with anti-debugging and anti-reversing techniques (if manually unpacking), and then digging some more. Afterwards, we’ll have to work some more, using Wireshark, and passing the encrypted data through the DLL.
- Perform a similar procedure, but for a PI client (Trojan horse). This might prove to be a little more rewarding (but just a little bit).
- Replace PILib.dll with our own DLL, which will reside between the C&C server and the original DLL. Calls to C_SK, C_E, and C_D will go to our DLL, and get both logged and redirected to the original DLL. Our log will immediately contain the secret key, the encrypted data, and the decrypted data.
Of these options, I chose the third one, as it’s more elegant than the other two (it’s a matter of opinion, I guess). Little did I know what I’ll have to do to get there. At first I thought that if the C&C server sees that PILib.dll already exists, it won’t overwrite it, so I can just name my proxy DLL PILib.dll and put it there. That, of course, was too good to be true. My DLL got overwritten, and the real PILib.dll was used. So I decided to patch LoadLibrary, so when a request to load PILib.dll arrives, it will instead load my DLL. The problem was that it’s really a pain to statically patch kernel32.dll, as it’s one of Windows’ known DLLs (and so, it’s not affected by the DLL search order), and it’s also protected by the OS (meaning that if you patch it in its directory, your patched version is going to revert to the original version in a matter of seconds).
So we’re left with dynamically patching kernel32.dll, after the PI C&C server is loaded into memory. Thing is, everything happens so quickly, so we need to be in control of the loading process. We can write a program that calls CreateProcess and ask its main thread to start in a suspended state. This is not enough, though, as at that point only ntdll.dll is loaded by the created process. We need to let the PI C&C run, but we don’t really want it to do anything, so we patch the code at its entry point and make it loop endlessly, doing nothing. Once we’re sure kernel32.dll got loaded by the loader for PI’s process, we can suspend it again, patch kernel32.dll, restore the original code at its entry point, and resume the process.
Here’s the piece of the code that does just that:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
// May throw an exception (access violation)
bool verify_patches(HANDLE hProcess, patch *patches, int numPatches) {
BYTE data[MAX_CODE_CHUNK_SIZE];
int idx;
for (idx = 0; idx < numPatches; idx++) {
if (!ReadProcessMemory(hProcess, patches[idx].lpBaseAddress, data, patches[idx].nSize, NULL))
return false;
if (memcmp(data, patches[idx].bOrig, patches[idx].nSize))
return false;
}
return true;
}
// May throw an exception (access violation)
bool apply_patches(HANDLE hProcess, patch *patches, int numPatches, bool revert) {
DWORD protection;
int idx;
for (idx = 0; idx < numPatches; idx++) {
VirtualProtectEx(hProcess, (LPVOID) patches[idx].lpBaseAddress, patches[idx].nSize, PAGE_EXECUTE_READWRITE, &protection);
if (!WriteProcessMemory(hProcess, (LPVOID) patches[idx].lpBaseAddress, revert ? patches[idx].bOrig : patches[idx].bPatch, patches[idx].nSize, NULL))
return false;
VirtualProtectEx(hProcess, (LPVOID) patches[idx].lpBaseAddress, patches[idx].nSize, protection, NULL);
}
return true;
}
void bail_out(const char *msg, HANDLE hProcess, HANDLE hThread) {
MessageBox(NULL, msg, "Error", MB_OK);
TerminateProcess(hProcess, 1);
CloseHandle(hProcess);
CloseHandle(hThread);
}
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
STARTUPINFO si;
PROCESS_INFORMATION pi;
ZeroMemory( &si, sizeof(si) );
si.cb = sizeof(si);
ZeroMemory( &pi, sizeof(pi) );
if (!CreateProcess(PROG_NAME, NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, PROG_DIR, &si, &pi)) {
MessageBox(NULL, "Can't create process " PROG_NAME, "Error", MB_OK);
return -1;
}
if (!verify_patches(pi.hProcess, &pi_patch, 1))
bail_out("Unable to verify patches", pi.hProcess, pi.hThread);
if (!apply_patches(pi.hProcess, &pi_patch, 1, false))
bail_out("Unable to apply patches", pi.hProcess, pi.hThread);
// Let it run in an endless loop so kernel32.dll will get loaded
ResumeThread(pi.hThread);
Sleep(3000);
SuspendThread(pi.hThread);
if (!verify_patches(pi.hProcess, kernel32_patches, PATCH_CHUNKS))
bail_out("Unable to verify patches", pi.hProcess, pi.hThread);
if (!apply_patches(pi.hProcess, kernel32_patches, PATCH_CHUNKS, false))
bail_out("Unable to apply patches", pi.hProcess, pi.hThread);
if (!apply_patches(pi.hProcess, &pi_patch, 1, true))
bail_out("Unable to apply patches", pi.hProcess, pi.hThread);
ResumeThread(pi.hThread);
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
return 0;
} |
Now for the patches. The patch for the endless loop is simply “jmp -2″. Our patch for LoadLibrary basically looks like this:
LoadLibraryA: jmp start_of_code ... start_of_code: push [esp+4] push address "PILib.dll" (data+1) call _strcmpi push edi mov edi,address "aPILib.dll" (data) test eax,eax jnz back mov [esp+10h],edi back: pop edi pop eax pop eax pop ebp mov ebp,esp jmp LoadLibraryA+5 data: "aPILib.dll"
The code is inefficient, for reasons you will soon understand. Some things to pay attention to:
- _strcmpi does not clean the stack, so we need to do it ourselves. This is why we use “esp+10h”, and we also have two “pop eax” commands there.
- We look for the string “PILib.dll” (case-insensitive). if we find that this is the requested DLL, we trick LoadLibrary into thinking it should load “aPILib.dll”. That would be the name of our DLL proxy.
Now, the easy way to go about applying the patch, is to put most of the code in a new memory region (using VirtualAllocEx and WriteProcessMemory), and jump to it from LoadLibrary. However, we’re not here for the easy way.
I decided to patch kernel32.dll directly, using whatever small NOP space I could find. The result was a series of patches that form a chain, where each link in the chain (short) jumps to the next link, and the last link jumps back to LoadLibrary. The result can be found below (it prefixes the C code given above):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
#include <windows.h>
#define PROG_NAME "C:\\Work\\Poison Ivy\\PI2.3.2\\PI.EXE"
#define PROG_DIR "C:\\Work\\Poison Ivy\\PI2.3.2"
// Hard-code this since we're lazy
#define ENTRY_POINT 0x00611060
// Calculate the correction we need to apply for the patch to work
// This is for Windows XP SP3 with static addresses
// If ASLR was involved, we would simply inject code that looks for kernel32.dll
#define KERNEL32_BASE 0x7C800000
#define KERNEL32_LOADLIBRARYA 0x7C801D7B
#define PATCH_LOADLIBRARYA 0x00871D7B
#define PATCH_BASE (PATCH_LOADLIBRARYA - (KERNEL32_LOADLIBRARYA - KERNEL32_BASE))
#define PATCH_OFFSET_CORRECTION (KERNEL32_BASE - PATCH_BASE)
#define KERNEL32_STRCMPI 0x7C8013AC
// This is the virtual address of our "apilib.dll" string
// We have to work it out here, since it's hard-coded in our injected code
#define PILIB_DLL_STR_VA (0x00872614 + PATCH_OFFSET_CORRECTION)
#define BYTE0(x) ((BYTE) ((x) & 0xFF))
#define BYTE1(x) ((BYTE) (((x) >> 8) & 0xFF))
#define BYTE2(x) ((BYTE) (((x) >> 16) & 0xFF))
#define BYTE3(x) ((BYTE) (((x) >> 24) & 0xFF))
#define MAX_CODE_CHUNK_SIZE 50
#define PATCH_CHUNKS 7
typedef struct patch {
LPCVOID lpBaseAddress;
SIZE_T nSize;
BYTE bOrig[MAX_CODE_CHUNK_SIZE];
BYTE bPatch[MAX_CODE_CHUNK_SIZE];
} patch;
patch pi_patch = {
(LPCVOID) ENTRY_POINT,
2,
{0x60, 0xE8},
{0xEB, 0xFE} // jmp -2 (endless loop)
};
patch kernel32_patches[PATCH_CHUNKS] = {
{
(LPCVOID) (PATCH_LOADLIBRARYA + PATCH_OFFSET_CORRECTION),
5,
{
0x8B, 0xFF, // mov edi,edi
0x55, // push ebp
0x8B, 0xEC // mov ebp,esp
},
{
0xE9, 0xD9, 0x06, 0x00, 0x00 // jmp start_of_code
}
},
{
(LPCVOID) (0x00872459 + PATCH_OFFSET_CORRECTION),
6,
{0x90, 0x90, 0x90, 0x90, 0x90, 0x90},
{
0xFF, 0x74, 0x24, 0x04, // start_of_code: push [esp+4]
0xEB, 0x0D // jmp cont1
}
},
{
(LPCVOID) (0x0087246C + PATCH_OFFSET_CORRECTION),
19,
{
0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90,
0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90
},
{
0x68, // cont1: push address "pilib.dll"
BYTE0(PILIB_DLL_STR_VA + 1),
BYTE1(PILIB_DLL_STR_VA + 1),
BYTE2(PILIB_DLL_STR_VA + 1),
BYTE3(PILIB_DLL_STR_VA + 1),
0xFF, 0x15, // call _strcmpi
BYTE0(KERNEL32_STRCMPI),
BYTE1(KERNEL32_STRCMPI),
BYTE2(KERNEL32_STRCMPI),
BYTE3(KERNEL32_STRCMPI),
0x57, // push edi
0xBF, // mov edi,address "apilib.dll"
BYTE0(PILIB_DLL_STR_VA),
BYTE1(PILIB_DLL_STR_VA),
BYTE2(PILIB_DLL_STR_VA),
BYTE3(PILIB_DLL_STR_VA),
0xEB, 0x33 // jmp cont2
}
},
{
(LPCVOID) (0x008724B2 + PATCH_OFFSET_CORRECTION),
4,
{0x90, 0x90, 0x90, 0x90},
{
0x85, 0xC0, // cont2: test eax,eax
0xEB, 0x6C // jmp cont3
}
},
{
(LPCVOID) (0x00872522 + PATCH_OFFSET_CORRECTION),
14,
{0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90},
{
0x75, 0x04, // cont3: jnz back
0x89, 0x7C, 0x24, 0x10, // mov [esp+10h],edi
0x5F, // back: pop edi
0x58, // pop eax
0x58, // pop eax
0x55, // push ebp
0x8B, 0xEC, // mov ebp,esp
0xEB, 0x16 // jmp final
}
},
{
(LPCVOID) (0x00872546 + PATCH_OFFSET_CORRECTION),
5,
{0x90, 0x90, 0x90, 0x90, 0x90},
{
0xE9, 0x35, 0xF8, 0xFF, 0xFF // final: jmp LoadLibraryA+5
}
},
{
(LPCVOID) PILIB_DLL_STR_VA,
11,
{0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90},
"aPILib.dll"
}
}; |
We’re left with the task of writing the DLL proxy. Two things are worth mentioning in that context:
- The original C_SK, C_E, and C_D perform their own stack cleanup, so they should be called with the __stdcall calling convention.
- As per the previous point, our replacement C_SK, C_E, and C_D should also be declared as __stdcall. However, this means that the Visual Studio compiler will generate decorated names for our exported functions, based on the amount of memory they pop off the stack upon returning (i.e., C_SK@8, C_E@12, and C_D@12). In turn, this means that the PI server won’t find the functions that it looks for, and thus won’t be able to use them. We therefore need to create a DEF file (not supplied here), to make sure we don’t have decorated names. The whole ‘extern “C” __declspec(dllexport)’ is thus probably unnecessary, as it doesn’t lead to the intended result (in this case).
Here’s the code for a simple pass-through DLL proxy:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
#include <windows.h>
#define LIBRARY_API extern "C" __declspec(dllexport)
typedef int (__stdcall *c_sk_t)(unsigned char *, unsigned int *);
typedef int (__stdcall *c_e_t)(unsigned int *, unsigned int *, unsigned int *);
typedef int (__stdcall *c_d_t)(unsigned int *, unsigned int *, unsigned int *);
HMODULE hMod = NULL;
c_sk_t c_sk = NULL;
c_e_t c_e = NULL;
c_d_t c_d = NULL;
LIBRARY_API void __stdcall C_SK(unsigned char *key256bit, unsigned int *allkeys) {
c_sk(key256bit, allkeys);
}
LIBRARY_API void __stdcall C_E(unsigned int *param1, unsigned int *param2, unsigned int *param3) {
c_e(param1, param2, param3);
}
LIBRARY_API void __stdcall C_D(unsigned int *param1, unsigned int *param2, unsigned int *param3) {
c_d(param1, param2, param3);
}
BOOL WINAPI DllMain(
HINSTANCE hinstDLL, // handle to DLL module
DWORD fdwReason, // reason for calling function
LPVOID lpReserved ) // reserved
{
switch( fdwReason )
{
case DLL_PROCESS_ATTACH:
if ((hMod = LoadLibrary("PILIB_REAL.DLL")) == NULL)
return FALSE;
if ((c_sk = (c_sk_t)GetProcAddress(hMod, "C_SK")) == NULL ||
(c_e = (c_e_t)GetProcAddress(hMod, "C_E")) == NULL ||
(c_d = (c_d_t)GetProcAddress(hMod, "C_D")) == NULL) {
FreeLibrary(hMod);
return FALSE;
}
break;
case DLL_PROCESS_DETACH:
FreeLibrary(hMod);
break;
}
return TRUE;
} |
Of course, you need to add your logging mechanism to the mix. The first parameter to C_SK gives you the symmetric key being used. As for the rest of the functions, I leave them for you to explore.
Update: An initial analysis of Poison Ivy is also available.

