# x64 Call Stack Spoofing

## Preface

In my previous blog, I discussed an implementation of x64 return address spoofing. While this technique spoofs the return address, it has a significant drawback: Spoofing the return address breaks the call stack chain and leads to easy detection. In this blog, we will build upon return address spoofing and look at spoofing the call stack of a thread.

This technique is not new, and extensive research has been done by [`namazso` ](https://x.com/namazso), [`KlezVirus`](https://x.com/klezvirus), [`waldoirc`](https://x.com/waldoirc), [`trickster012`](https://x.com/trickster012), and others. The aim of this blog will be to break down this technique into simpler parts and discuss how to implement call stack spoofing while calling any WinAPI.&#x20;

The code for this project can be found on my [GitHub](https://github.com/HulkOperator/CallStackSpoofer).

## Introduction

This post will delve into the implementation of creating synthetic stack frames to mask the origin of API calls. By doing so, we can trick security solutions that monitor the call stacks to detect tampering with return addresses. First, let's observe the broken call stack from "return address spoofing".

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FGNGs2HhTHhPjvwh5wYAq%2Fret_addr_issue.png?alt=media&#x26;token=3339d1ee-d1ea-435c-ad00-f0dab1036576" alt=""><figcaption><p>Incomplete Stack Unwinding</p></figcaption></figure></div>

The above image of a thread's call stack is an example of incomplete stack unwinding. The value of "0x4" is a leaked memory value, suggesting that the stack unwinding was terminated. In contrast, a thread with proper unwinding should be as follows:

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2F0s3ziTi89Do7YbS2iUZA%2Fnormal_stack_unwinding.png?alt=media&#x26;token=4139a6c0-ccbc-4840-ab0b-a2d84845e569" alt=""><figcaption><p>Normal Call Stack</p></figcaption></figure></div>

By spoofing the call stack while calling an API, we will create synthetic stack frames with proper stack unwinding and then spoof the return address.

Before diving deep into the implementation, it's essential to understand how the x64 stack works.

## x64 Stack Frame

The stack is a memory region within a process where space is allocated for functions to store their dependencies. The dependencies include allocating space for local variables and saving non-volatile registers. If a function modifies the non-volatile registers, it will be restored from the values saved on the stack.

Each function has its own stack frame, and when a function's execution has been completed, this frame is deallocated. Below is a simple demonstration of how the stack frame for the "Func" function is allocated and deallocated.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FeGIpIYqmvgRpARU8iOLm%2Fstack.gif?alt=media&#x26;token=6782db36-d5ae-4666-9f69-faf0ffc1d893" alt=""><figcaption><p>Stack Allocation &#x26; Deallocation</p></figcaption></figure></div>

Following is the disassembly of a simple function, which can be divided into three parts. The first is the function's prologue, the second is the function's body, and the last is the function's epilogue. The function's prologue is responsible for saving non-volatile registers and making space on the stack. On the other hand, the function's epilogue will reverse these instructions to deallocate the stack space and restore the values of non-volatile registers.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FJDOVP1E6ZZGZZT9jQ6am%2Ffunction_stack.png?alt=media&#x26;token=90fa22d5-603a-4755-97bb-126d0114df02" alt=""><figcaption><p>Function's Assembly</p></figcaption></figure></div>

## Call Stack

A Call stack represents all the functions that were called by the thread to reach its current execution state. In the image below, the execution is currently waiting at "NtUserWaitMessage+x014"; this function was called by "DialogBoxIndirectParamAorW". Going further down, we can see that the "MessageBoxA" was called by the "main" function, which is our code. Finally, the last two frames are called as Thread Initialising frames.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FbdtPpT3GKYqk7zTQiPSJ%2Fcall_stack.png?alt=media&#x26;token=31147199-e0f8-4b5c-ba94-1dd6f20e31f0" alt=""><figcaption><p>Call Stack of a thread executing MessageBoxA</p></figcaption></figure></div>

At the current execution state, this is how the stack of the thread looks like

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2F0ovuhL7npjo4GiXhL3rq%2Fstack_values.png?alt=media&#x26;token=2987cc8f-6ea7-4a27-87b5-e8d5fc693a97" alt=""><figcaption><p>Entire Stack of a Thread</p></figcaption></figure></div>

A quick and dirty approach to implementing stack spoofing would be to create a synthetic call stack using these values. However, this will not be a robust implementation and will fail across different builds/ versions of Windows. To avoid such issues, we need to identify the size of each stack frame dynamically, i.e., the size of the "RtlUserThreadStart" Frame, "BaseThreadInitThunk" Frame, etc.

To dynamically calculate the stack size, we need to understand Exception Handling in Windows and the ".PDATA" section.

## Exception Handling & .PDATA

When an exception is raised, the "exception dispatcher" checks if any exception handler is defined in the current function. If a handler doesn't exist, then the function's stack is unwound to restore the stack of the caller's frame and an exception handler is checked in the caller's function. This process is repeated until an exception handler is found or the whole call stack is unwound. The information required to unwind the stack of a function is defined within the ".PDATA" section. This section contains an array of "RUNTIME\_FUNCTION" structures.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2F5Apx5ISvGwfUdOxeSH8W%2Fruntime_function.png?alt=media&#x26;token=484e18ce-027d-43d7-a0a0-18c23f78e34e" alt=""><figcaption><p>RUNTIME_FUNCTION Structure</p></figcaption></figure></div>

This structure contains the offset to the addresses of a function's start and end instructions. Additionally, it includes the offset to the "UNWIND\_INFO" structure.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2F3zBksLE2RW1tMLbsADtN%2Funwind_info.png?alt=media&#x26;token=7b525fc4-a3b1-4f4a-9df8-eec070340dce" alt=""><figcaption><p>UNWIND_INFO</p></figcaption></figure></div>

The "UNWIND\_INFO" structure contains an array of "UnwindCodes" and their count. Unwind Codes represent the instructions that are executed in a function's prologue. By going through this, we can calculate the size of the stack. We will consider only the following four Unwind Codes as they modify the size of a function's stack:

* UWOP\_ALLOC\_SMALL
* UWOP\_PUSH\_NONVOL
* UWOP\_ALLOC\_LARGE
* UWOP\_PUSH\_MACHFRAME

### Calculating the Size of a Function's Stack

The first step is to obtain the address of the ".PDATA" section. This will be done using the below code:

```c
typedef struct _EXCEPTION_INFO {

	UINT64 hModule;
	UINT64 pExceptionDirectory;
	DWORD dwRuntimeFunctionCount;

}EXCEPTION_INFO, *PEXCEPTION_INFO;

PVOID RetExceptionAddress(PEXCEPTION_INFO pExceptionInfo) {

	UINT64 pImgNtHdr, hModule;
	PIMAGE_OPTIONAL_HEADER64 pImgOptHdr;

	hModule = pExceptionInfo->hModule;

	pImgNtHdr = hModule + ((PIMAGE_DOS_HEADER)hModule)->e_lfanew;
	pImgOptHdr = &((PIMAGE_NT_HEADERS64)pImgNtHdr)->OptionalHeader;

	pExceptionInfo->pExceptionDirectory = hModule + pImgOptHdr->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION].VirtualAddress;
	pExceptionInfo->dwRuntimeFunctionCount = pImgOptHdr->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION].Size / sizeof(RUNTIME_FUNCTION);

}
```

Using the ".PDATA" Section, we can calculate a function's stack size

```c
DWORD RetStackSize(UINT64 hModule, UINT64 pFuncAddr) {

	EXCEPTION_INFO sExceptionInfo = { 0 };
	sExceptionInfo.hModule = hModule;

	RetExceptionAddress(&sExceptionInfo);

	PRUNTIME_FUNCTION pRuntimeFunction = (PRUNTIME_FUNCTION)sExceptionInfo.pExceptionDirectory;
	DWORD dwStackSize = 0, dwFuncOffset = pFuncAddr - hModule;
	PUNWIND_INFO pUnwindInfo;
	PUNWIND_CODE pUnwindCode;
	

	// Loop Through RunTimeFunction structures until we find the structure for our target function
	for (int i = 0; i < sExceptionInfo.dwRuntimeFunctionCount; i++) {
		if (dwFuncOffset >= pRuntimeFunction->BeginAddress && dwFuncOffset <= pRuntimeFunction->EndAddress) {
			break;
		}

		pRuntimeFunction++;
	}

	// From the RunTimeFunction structure we need the offset to UnwindInfo structure

	pUnwindInfo = ((PUNWIND_INFO)(hModule + pRuntimeFunction->UnwindInfoAddress));

	pUnwindCode = pUnwindInfo->UnwindCode; // UnwindCode Array

    // Loop Through the UnwindCodesArray and calculate Stack Size
	for (int i = 0; i < pUnwindInfo->CountOfUnwindCodes; i++) {

		UBYTE bUnwindCode = pUnwindCode[i].OpInfo;

		switch (bUnwindCode)
		{
		case UWOP_ALLOC_SMALL:
			dwStackSize += (pUnwindCode[i].OpInfo + 1) * 8;
			break;
		case UWOP_PUSH_NONVOL:
			if (pUnwindCode[i].OpInfo == 4)
				return 0;
			dwStackSize += 8;
			break;
		case UWOP_ALLOC_LARGE:
			if (pUnwindCode[i].OpInfo == 0) {
				dwStackSize += pUnwindCode[i + 1].FrameOffset * 8;
				i++;
			}
			else {
				dwStackSize += *(ULONG*)(&pUnwindCode[i + 1]);
				i += 2;
			}
			break;
		case UWOP_PUSH_MACHFRAME:
			if (pUnwindCode[i].OpInfo == 0)
				dwStackSize += 40;
			else
				dwStackSize += 48;
		case UWOP_SAVE_NONVOL:
			i++;
			break;
		case UWOP_SAVE_NONVOL_FAR:
			i += 2;
			break;
		default:
			break;
		}
	}
}
```

Using the above code snippets, we can dynamically figure out the stack size of any function during runtime.

### Gadget

To hide the return address of our code, we will be using JOP Gadgets as return addresses, which will, in turn, direct the execution flow back to us. An example of a JOP gadget is `jmp QWORD PTR [rbx]`. When this gets executed, the control flow is transferred to the address pointed by the value in `rbx`. Additionally, we can use any non-volatile register for this.

Using the following code snippet, we can obtain the address of our gadget within any module.

```c
PVOID RetGadget(UINT64 hModule) {

	PVOID pGadget = NULL;
	int r = rand() % 2, count = 0;
	
	DWORD dwSize = ((PIMAGE_NT_HEADERS64)(hModule + ((PIMAGE_DOS_HEADER)hModule)->e_lfanew))->OptionalHeader.SizeOfImage;

	for (int i = 0; i < dwSize - 1; i++) {

		if (((PBYTE)hModule)[i] == 0xff && ((PBYTE)hModule)[i+1] == 0x23) {
			pGadget = hModule + i;
			if (count >= r) {
				break;
			}

			count ++;
		}
	}
	return pGadget;
}
```

## Spoofing the Stack

In this section, we'll cover the steps for creating a synthetic stack frame.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2Fzt4fV0aOaVDHSJkZ1llL%2Fspoof_stack.jpeg?alt=media&#x26;token=3bb02ac0-94c1-4817-960d-de701044ac97" alt=""><figcaption><p>Call Stack Spoofing Flow</p></figcaption></figure></div>

Creating synthetic stack frames will be done using our "Spoof" function, which will be written in assembly. This function does the following steps:

1. Push "0" on the stack, which will terminate the stack unwinding.
2. Make space on the stack for "RtlUserThreadStart" Frame.
3. Push the Return Address "RtlUserThreadStart+0x21" on the stack.
4. Make space on the stack for "BaseThreadInitThunk" Frame.
5. Push the Return Address "BaseThreadInitThunk+0x14" on the stack.
6. Make space on the stack for our Gadget's Frame.
7. Push the Return Address to our gadget on the stack.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FgChBOF96oKsEPpgk0GSk%2Fspoof_stack2.jpeg?alt=media&#x26;token=2e8dfcff-7e94-4181-9f47-92fa237594d8" alt=""><figcaption><p>Spoofed Stack Frames</p></figcaption></figure></div>

The above image represents the spoofed part of the stack. Before executing our WinAPI, we need to configure the required arguments. Windows x64 uses the fastcall calling convention. The first four arguments are stored in the registers `rcx`, `rdx`, `r8`, and `r9`. Any additional arguments are pushed to the stack from right to left. Then, 4 bytes of space are allocated on the stack, which is called as shadow space. After this, the WinAPI is called.

Since we have already created our spoofed stack frames, we cannot push or pop any values, as that would break the chain. Instead, we need to configure additional arguments on the existing stack, as depicted in the above image.

## Assembly

When writing code in high-level languages such as 'C', all the steps, including management of registers and modification of stack during a function call or when the function returns, are abstracted away. However, we need control over how the stack behaves; hence, we will be writing this section entirely in assembly.

By using a structure to pass arguments to our "Spoof" function, the process of accessing all the arguments becomes simpler.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2Fy5Eq08ANA2BOjS8xrytZ%2Fstruct_def.png?alt=media&#x26;token=7d5eae6d-a59b-4007-9a47-f930d6305f7c" alt=""><figcaption><p>STACK_INFO Struct</p></figcaption></figure></div>

The following series of instructions creates our synthetic stack frames.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FDeIkXVPt6AhvlYYqfEuy%2Fassembly_1.png?alt=media&#x26;token=c4d5d0ff-c553-462b-b027-7d99e2687c06" alt=""><figcaption><p>Assembly Code - Part 1</p></figcaption></figure></div>

Now, we need to configure the arguments required for our target function.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FDQqOaE6BeCN3XaFQCI5i%2Fassembly_2.png?alt=media&#x26;token=694499c1-4a1d-4400-8cb7-0c8bed575663" alt=""><figcaption><p>Assembly Code - Part 2</p></figcaption></figure></div>

Half the part is done. Technically, what we have done until now will spoof the stack and successfully execute our target API. However, when the API call returns, the program will crash. This is because we haven't configured our gadget yet.

To avoid the crash, we have to revert the stack back to its original state. Hence, we will store the pointer to restore the stack within `rbx`. And when the gadget is executed control flow is given back to us.

Now, it's time to execute our target API, which will be done by using `jmp` instruction to our target API's address. Note that we are using the `jmp` instruction instead of the `call` instruction. If `call` is used, it pushes the current function's address on the stack. Instead, by using `jmp`, our gadget's address will act as the return address, indicating that the gadget's function is responsible for the call.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2F0tT2Dk6rF2hT79hoXe25%2Fassembly_3.png?alt=media&#x26;token=27e587b4-2223-4976-9c64-63d27d92f2e8" alt=""><figcaption><p>Assembly Code - Part 3</p></figcaption></figure></div>

We have now executed the target API and obtained the control flow back. What is left is to restore the stack back to its original state.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FxUh3U0L1f9SqRMV0EP9i%2Fassembly_4.png?alt=media&#x26;token=af5e92c7-a4b3-4052-a388-25b0829450e4" alt=""><figcaption><p>Assembly Code - Part 4</p></figcaption></figure></div>

## Putting It All Together

We have all the bits and pieces ready for our trickery. Now, we'll use a function to orchestrate our circus.

```c
PVOID CallStackSpoof(UINT64 pTargetFunction, DWORD dwNumberOfArgs, ...) {

	srand((time(0)));
	va_list va_args;
	STACK_INFO sStackInfo = { 0 };
	UINT64 pGadget, pRtlUserThreadStart, pBaseThreadInitThunk;
	UINT64 pNtdll, pKernel32;

	pNtdll = GetModuleHandleA("ntdll");
	pKernel32 = GetModuleHandleA("kernel32");

	pGadget = RetGadget(pKernel32);
	pRtlUserThreadStart = GetProcAddress(pNtdll, "RtlUserThreadStart");
	pBaseThreadInitThunk = GetProcAddress(pKernel32, "BaseThreadInitThunk");

	sStackInfo.pGadgetAddress = pGadget;
	sStackInfo.dwGadgetSize = RetStackSize(pKernel32, pGadget);
	sStackInfo.pRtlUserThreadStart = pRtlUserThreadStart + 0x21;
	sStackInfo.dwRtlUserThreadStartSize = RetStackSize(pNtdll, pRtlUserThreadStart);
	sStackInfo.pBaseThreadInitThunk = pBaseThreadInitThunk + 0x14;
	sStackInfo.dwBaseThreadInitThunk = RetStackSize(pKernel32, pBaseThreadInitThunk);
	sStackInfo.pTargetFunction = pTargetFunction;

	if (dwNumberOfArgs <= 4)
		sStackInfo.dwNumberOfArguments = 4;
	else if (dwNumberOfArgs % 2 != 0)
		sStackInfo.dwNumberOfArguments = dwNumberOfArgs + 1;
	else
		sStackInfo.dwNumberOfArguments = dwNumberOfArgs;

	sStackInfo.pArgs = malloc(8 * sStackInfo.dwNumberOfArguments);

	va_start(va_args, dwNumberOfArgs);
	for (int i = 0; i < dwNumberOfArgs; i++) {

		(&sStackInfo.pArgs)[i] = va_arg(va_args, UINT64);

	}
	va_end(va_args);
	return Spoof(&sStackInfo);

}
```

## Results

```c
#include <Windows.h>
#include "spoofer.h"

int main() {

	HMODULE pUser32 = LoadLibraryA("User32");
	UINT64 pMessageBoxA = GetProcAddress(pUser32, "MessageBoxA");

	CallStackSpoof(pMessageBoxA, 4, NULL, "Text", "Caption", MB_YESNO);

}
```

Let's observe the call stack when "MessageBoxA" is called.

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FxPC8P8BbsL3ETMvDuY5h%2Fspoofed_mbox.png?alt=media&#x26;token=79aa8b88-a94e-4e16-a071-efb4079c934e" alt=""><figcaption><p>Spoofed Call Stack</p></figcaption></figure></div>

## Indicators of Compromise

Similar to all techniques, call stack spoofing also has certain indicators of compromise.&#x20;

The only reason for all the return addresses to be present on the Call Stack is that there was a call instruction involved. However, in the case of our gadget, there will be a missing call instruction.

### RtlUserThreadStart

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FTvtOhycJVJnMKRxlsVEz%2Fdetection_1.jpeg?alt=media&#x26;token=95dd0c7e-c959-4604-83d0-ceb89fb14506" alt=""><figcaption></figcaption></figure></div>

### BaseThreadInitThunk

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FtHevQv46nEuqsNAbi03N%2Fdetection_2.jpeg?alt=media&#x26;token=2903d725-ff79-41e9-82c8-40f8930db354" alt=""><figcaption></figcaption></figure></div>

### Gadget's Address

<div align="left"><figure><img src="https://1334214017-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGdM1jIOKw0EWSCHRMsql%2Fuploads%2FWxX2tctakWqB5YRYdxnE%2Fdetection_3.jpeg?alt=media&#x26;token=1aa1c3fb-a30d-474d-bd63-fa9732d329f2" alt=""><figcaption></figcaption></figure></div>

From the above image, the missing "call" instruction indicates that there was no call instruction to push the gadget's address on the stack.

## References

* [SilentMoonWalk by KlezVirus, Waldo-IRC, and Trickster0](https://github.com/klezVirus/SilentMoonwalk)
* [Intro to Stack Spoofing by Nigerald](https://dtsec.us/2023-09-15-StackSpoofin/)
* [x64 Deep Dive by CodeMachine](https://codemachine.com/articles/x64_deep_dive.html)
* [ReactOS](https://github.com/reactos/reactos)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hulkops.gitbook.io/blog/red-team/x64-call-stack-spoofing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
