In April 2021 we went through the anatomy of a Cobalt Strike stager and how some of its signature evasion techniques ended up being ineffective against detection technologies. In this blog post we will go one level deeper and focus on Metasploit, an often-used framework interoperable with Cobalt Strike.
Throughout this blog post we will cover the following topics:
- The shellcode’s import resolution – How Metasploit shellcode locates functions from other DLLs and how we can precompute these values to resolve any imports from other payload variants.
- The reverse-shell’s execution flow – How trivial a reverse shell actually is.
- Disruption of the Metasploit import resolution – A non-intrusive deception technique (no hooks involved) to have Metasploit notify the antivirus (AV) of its presence with high confidence.
For this analysis, we generated our own shellcode using Metasploit under version v6.0.30-dev
. The malicious sample generated using the command below had as resulting SHA256 hash of 3792f355d1266459ed7c5615dac62c3a5aa63cf9e2c3c0f4ba036e6728763903
and is available on VirusTotal for readers willing to have a try themselves.
msfvenom -p windows/shell_reverse_tcp -a x86 > shellcode.vir
Throughout the analysis we have renamed functions, variables and offsets to reflect their role and improve clarity.
Initial Analysis
In this section we will outline the initial logic followed to determine the next steps of the analysis (import resolution and execution flow analysis).
While a typical executable contains one or more entry-points (exported functions, TLS-callbacks, …), shellcode can be seen as the most primitive code format where initial execution occurs from the first byte.
Analyzing the generated shellcode from the initial bytes outlines two operations:
- The first instruction at ① can be ignored from an analytical perspective. The
cld
operation clears the direction flag, ensuring string data is read on-wards instead of back-wards (e.g.:cmd
vsdmc
). - The second
call
operation at ② transfers execution to a function we namedMain
, this function will contain the main logic of the shellcode.

Main
function.Within the Main
function, we observe additional calls such as the four ones highlighted in the trimmed figure below (③, ④, ⑤ and ⑥). These calls target a yet unidentified function whose address is stored in the ebp
register. To understand where this function is located, we will need to take a step back and understand how a call
instruction operates.

Main
function.A call
instruction transfers execution to the target destination by performing two operations:
- It pushes the return address (the memory address of the instruction located after the
call
instruction) on the stack. This address can later be used by theret
instruction to return execution from the called function (callee) back to the calling function (caller). - It transfers execution to the target destination (callee), as a
jmp
instruction would.
As such, the first pop
instruction from the Main
function at ③ stores the caller’s return address into the ebp
register. This return address is then called as a function later on, among others at offset 0x99
, 0xA9
and 0xB8
(④, ⑤ and ⑥). This pattern, alongside the presence of a similarly looking push
before each call
tends to suggest the return address stored within ebp
is the dynamic import resolution function.
Without diving into unnecessary depth, a “normal” executable (e.g.: Portable Executable on Windows) contains the necessary information so that, once loaded by the Operating System (OS) loader, the code can call imported routines such as those from the Windows API (e.g.: LoadLibraryA
). To achieve this default behavior, the executable is expected to have a certain structure which the OS can interpret. As shellcode is a bare-bone version of the code (it has none of the expected structures), the OS loader can’t assist it in resolving these imported functions; even more so, the OS loader will fail to “execute” a shellcode file. To cope with this problem, shellcode commonly performs a “dynamic import resolution”.
One of the most common techniques to perform “dynamic import resolution” is by hashing each available exported function and compare it with the required import’s hash. As shellcode authors can’t always predict whether a specific DLL (e.g.: ws3_32.dll
for Windows Sockets) and its exports are already loaded, it is not uncommon to observe shellcode loading DLLs by calling the LoadLibraryA
function first (or one of its alternatives). Relying on LoadLibraryA
(or alternatives) before calling other DLLs’ exports is a stable approach as these library-loading functions are part of kernel32.dll
, one of the few DLLs which can be expected to be loaded into each process.
To confirm our above theory, we can search for all call
instructions as can be seen in the following figure (e.g.: using IDA’s Text...
option under the Search
menu). Apart from the first call to the Main
function, all instances refer to the ebp
register. This observation, alongside well-known constants we will observe in the next section, supports our theory that the address stored in ebp
holds a pointer to the function performing the dynamic import resolution.

call
instructions in the shellcode.The abundance of calls towards the ebp
register suggests it indeed holds a pointer to the import resolution function, which we now know is located right after the first call to Main
.
Import Resolution Analysis
So far we noticed the instructions following the initial call to Main
play a crucial role as what we expect to be the import resolution routine. Before we analyze the shellcode’s logic, let us analyze this resolution routine as it will ease the understanding of the remaining calls.
From Import Hash to Function
The code located immediately after the initial call to Main
is where the import resolution starts. To resolve these imports, the routine first locates the list of modules loaded into memory as these contain their available exported functions.
To find these modules, an often leveraged shellcode technique is to interact with the Process Environment Block (shortened as PEB
).
In computing the Process Environment Block (abbreviated PEB) is a data structure in the Windows NT operating system family. It is an opaque data structure that is used by the operating system internally, most of whose fields are not intended for use by anything other than the operating system. […] The PEB contains data structures that apply across a whole process, including global context, startup parameters, data structures for the program image loader, the program image base address, and synchronization objects used to provide mutual exclusion for process-wide data structures.
wikipedia.org
As can be observed in figure 4, to access the PEB
, the shellcode accesses the Thread Environment Block (TEB
) which is immediately accessible through a register (⑦). The TEB
structure itself contains a pointer to the PEB
(⑦). From the PEB
, the shellcode can locate the PEB_LDR_DATA
structure (⑧) which in turn contains a reference to multiple double-linked module lists. As can be observed at (⑨), the Metasploit shellcode leverages one of these double-linked lists (InMemoryOrderModuleList
) to later iterate through the LDR_DATA_TABLE_ENTRY
structures containing the loaded module information.
Once the first module is identified, the shellcode retrieves the module’s name (BaseDllName
.Buffer
) at ⑩ and the buffer’s maximum length ( BaseDllName
.MaximumLength
) at ⑪ which is required as the buffer is not guaranteed to be NULL
-terminated.

One point worth highlighting is that, as opposed to usual pointers (TEB.ProcessEnvironmentBlock
, PEB.Ldr
, …), a double-linked list points to the next item’s list entry. This means that instead of pointing to the structures’ start, a pointer from the list will target a non-zero offset. As such, while in the following figure the LDR_DATA_TABLE_ENTRY
has the BaseDllName
property at offset 0x2C
, the offset from the list entry’s perspective will be 0x24
(0x2C-0x08
). This can be observed in the above figure 4 where an offset of 8
has to be subtracted to access both of the BaseDllName
properties at ⑩ and ⑪.

TEB
to BaseDllName
.With the DLL name’s buffer and maximum length recovered, the shellcode proceeds to generate a hash. To do so, the shellcode performs a set of operations for each ASCII character within the maximum name length:
- If the character is lowercase, it gets modified into an uppercase. This operation is performed according to the character’s ASCII representation meaning that if the value is
0x61
or higher (a
or higher),0x20
gets subtracted to fall within the uppercase range. - The generated hash (initially
0
) is rotated right (ROR) by 13 bits (0x0D
). - The upper-cased character is added to the existing hash.

KERNEL32.DLL
‘s first character (K
).With the repeated combination of rotations and additions on a fixed registry size (32 bits in edi
‘s case), characters will ultimately start overlapping. These repeated and overlapping combinations make the operations non-reversible and hence produces a 32-bit hash/checksum for a given name.
One interesting observation is that while the BaseDllName
in LDR_DATA_TABLE_ENTRY
is Unicode-encoded (2 bytes per character), the code treats it as ASCII encoding (1 byte per character) by using lodsb
(see ⑫).

The hash generation algorithm can be implemented in Python as shown in the snippet below. While we previously mentioned that the BaseDllName
‘s buffer was not required to be NULL
-terminated per Microsoft documentation, extensive testing has showed that NULL
-termination was always the case and could generally be assumed. This assumption is what makes the MaximumLength
property a valid boundary, similarly to the Length
property. The following snippet hence expects the data passed to get_hash
to be a Python bytes
object generated from a NULL
-terminated Unicode string.
# Helper function for rotate-right on 32-bit architectures
def ror(number, bits):
return ((number >> bits) | (number << (32 - bits))) & 0xffffffff
# Define hashing algorithm
def get_hash(data):
# Initialize hash to 0
result = 0
# Loop each character
for b in data:
# Make character uppercase if needed
if b < ord('a'):
b -= 0x20
# Rotate DllHash right by 0x0D bits
result = ror(result, 0x0D)
# Add character to DllHash
result = (result + b) & 0xffffffff
return result
The above functions could be used as follows to compute the hash of KERNEL32.DLL
.
# Define a NULL-terminated base DLL name
name = 'KERNEL32.DLL\0'
# Encode it as Unicode
encoded = name.encode('UTF-16-LE')
# Compute the hash
value = hex(get_hash(encoded))
# And print it ('0x92af16da')
print(value)
With the DLL name’s hash generated, the shellcode proceeds to identify all exported functions. To do so, the shellcode starts by retrieving the LDR_DATA_TABLE_ENTRY
‘s DllBase
property (⑬) which points to the DLL’s in-memory address. From there, the IMAGE_EXPORT_DIRECTORY
structure is identified by walking the Portable Executable’s structures (⑭ and ⑮) and adding the relative offsets to the DLL’s in-memory base address. This last structure contains the number of exported function names (⑰) as well as a table of pointers towards these (⑯).

The above operations can be schematized as follow, where dotted lines represent addresses computed from relative offsets increased by the DLL’s in-memory base address.

LDR_DATA_TABLE_ENTRY
to IMAGE_EXPORT_DIRECTORY
.Once the number of exported names and their pointers are identified, the shellcode enumerates the table in descending order. Specifically, the number of names is used as a decremented counter at ⑱. For each exported function’s name and while none matches, the shellcode performs a hashing routine (hash_export_name
at ⑲) similar to the one we observed previously, with as sole difference that character cases are preserved (hash_export_character
).
The final hash is obtained by adding the recently computed function hash (ExportHash
) to the previously obtained module hash (DllHash
) at ⑳. This addition is then compared at ㉑ to the sought hash and, unless they match, the operation starts again for the next function.

If none of the exported functions match, the routine retrieves the next module in the InMemoryOrderLinks
double-linked list and performs the above operations again until a match is found.

The above walked double-linked list can be schematized as the following figure.

InMemoryOrderModuleList
.If a match is found, the shellcode will proceed to call the exported function. To retrieve its address from the previously identified IMAGE_EXPORT_DIRECTORY
, the code will first need to map the function’s name to its ordinal (㉒), a sequential export number. Once the ordinal is recovered from the AddressOfNameOrdinals
table, the address can be obtained by using the ordinal as an index in the AddressOfFunctions
table (㉓).

Finally, once the export’s address is recovered, the shellcode simulates the call
behavior by ensuring the return address is first on the stack (removing the hash it was searching for, at ㉔) , followed by all parameters as required by the default Win32 API __stdcall
calling convention (㉕). The code then performs a jmp
operation at ㉖ to transfer execution to the dynamically resolved import which, upon return, will resume from where the initial call ebp
operation occurred.
Overall, the dynamic import resolution can be schematized as a nested loop. The main loop walks modules following the in-memory order (blue in the figure below) while, for each module, a second loop walks exported functions looking for a matching hash between desired import and available exports (red in the figure below).

Building a Rainbow Table
Identifying which imports the shellcode relies on will provide us with further insight into the rest of its logic. Instead of dynamically analyzing the shellcode, and given that we have figured out the hashing algorithm above, we can build ourselves a rainbow table.
A rainbow table is a precomputed table for caching the output of cryptographic hash functions, usually for cracking password hashes.
wikipedia.org
The following Python snippet computes the “Metasploit” hashes for DLL exports located in the most common system locations.
import glob
import os
import pefile
import sys
size = 32
mask = ((2**size) - 1)
# Resolve 32- and 64-bit System32 paths
root = os.environ.get('SystemRoot')
if not root:
raise Exception('Missing "SystemRoot" environment variable')
globs = [f"{root}\\System32\\*.dll", f"{root}\\SysWOW64\\*.dll"]
# Helper function for rotate-right
def ror(number, bits):
return ((number >> (bits % size)) | (number << (size - (bits % size)))) & mask
# Define hashing algorithm
def get_hash(data):
result = 0
for b in data:
result = ror(result, 0x0D)
result = (result + b) & mask
return result
# Helper function to uppercase data
def upper(data):
return [(b if b < ord('a') else b - 0x20) for b in data]
# Print CSV header
print("File,Function,IDA,Yara")
# Loop through all DLLs
for g in globs:
for file in glob.glob(g):
# Compute the DllHash
name = upper(os.path.basename(file).encode('UTF-16-LE') + b'\x00\x00')
file_hash = get_hash(name)
try:
# Parse the DLL for exports
pe = pefile.PE(file, fast_load=True)
pe.parse_data_directories(directories = [pefile.DIRECTORY_ENTRY["IMAGE_DIRECTORY_ENTRY_EXPORT"]])
if hasattr(pe, "DIRECTORY_ENTRY_EXPORT"):
# Loop through exports
for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
if exp.name:
# Compute ExportHash
name = exp.name.decode('UTF-8')
exp_hash = get_hash(exp.name + b'\x00')
metasploit_hash = (file_hash + exp_hash) & 0xffffffff
# Compute additional representations
ida_view = metasploit_hash.to_bytes(size/8, byteorder='big').hex().upper() + "h"
yara_view = metasploit_hash.to_bytes(size/8, byteorder='little').hex(' ')
# Print CSV entry
print(f"\"{file}\",\"{name}\",\"{ida_view}\",\"{{{yara_view}}}\"")
except pefile.PEFormatError:
print(f"Unable to parse {file} as a valid PE, skipping.", file=sys.stderr)
continue
As an example, the following PowerShell commands generate a rainbow table, then searches it for the 726774Ch
hash we observed first in figure 2. For everyone’s convenience, we have published our rainbow.csv
version containing 239k hashes.
# Generate the rainbow table in CSV format
PS > .\rainbow.py | Out-File .\rainbow.csv -Encoding UTF8
# Search the rainbow table for a hash
PS > Get-Content .\rainbow.csv | Select-String 726774Ch
"C:\Windows\System32\kernel32.dll","LoadLibraryA","0726774Ch","{4c 77 26 07}"
"C:\Windows\SysWOW64\kernel32.dll","LoadLibraryA","0726774Ch","{4c 77 26 07}"
As can be observed above, the first import resolved and called by the shellcode is LoadLibraryA
, exported by the 32- and 64-bit kernel32.dll
.
Execution Flow Analysis
With the import resolving sorted-out, understanding the remaining code becomes a lot more accessible. As we can see in figure 15, the shellcode starts by performing the following calls:
LoadLibraryA
at ㉗ to ensure thews3_32
library is loaded. If not yet loaded, this will map thews3_32.dll
DLL in memory, enabling the shellcode to further resolve additional functions related to the Windows Socket 2 technology.WSAStartup
at ㉘ to initiate the usage of sockets within the shellcode’s process.WSASocketA
at ㉙ to create a new socket. This one will be a stream-based (SOCK_STREAM
) socket over IPv4 (AF_INET
).

Once the socket is created, the shellcode proceeds to call the connect
function at ㉝ with the sockaddr_in
structure previously pushed on the stack (㉜). The sockaddr_in
structure contains valuable information from an incident response perspective such as the protocol (0x0200
being AF_INET
, a.k.a. IPv4, in little endianness), the port (0x115c
being the default 4444
Metasploit port in big endianness) as well as the C2 IPv4 address at ㉛ (0xc0a801ca
being 192.168.1.202
in big endianness).
If the connection fails, the shellcode retries up to 5 times (decrementing at ㉞ the counter defined at ㉚) after which it will abort execution using ExitProcess
(㉟).

If the connection succeeds, the shellcode will create a new cmd
process and connect all of its Standard Error, Output and Input (㊱) to the established C2 socket. The process itself is started through a CreateProcessA
call at ㊲.

Finally, while the process is running, the shellcode performs the following operations:
- Wait indefinitely at ㊳ for the remote shell to terminate by calling
WaitForSingleObject
. - Once terminated, identify the Windows operating system version at ㊴ using
GetVersion
and exit at ㊵ using eitherExitProcess
orRtlExitUserThread
.

Overall, the execution flow of Metasploit’s windows/shell_reverse_tcp
shellcode can be schematized as follows:

Shellcode Disruption
With the execution flow analysis squared away, let’s see how we can turn the tables on the shellcode and disrupt it. From an attacker’s perspective, the shellcode itself is considered trusted while the environment it runs in is hostile. This section will build upon the assumption that we don’t know where shellcode is executing in memory and, as such, hooking/modifying the shellcode itself is not an acceptable solution.
In this section we will firstly focus on the theoretical aspects before covering a proof-of-concept implementation.
The Weaknesses
CWE-1288: Improper Validation of Consistency within Input
The product receives a complex input with multiple elements or fields that must be consistent with each other, but it does not validate or incorrectly validates that the input is actually consistent.
cwe.mitre.org
From the shellcode’s perspective only two external interactions provide a possible attack surface. The first and most obvious surface is the C2 channel where some security solutions can detect/impair either the communications protocol or the surrounding API calls. This attack surface however has the massive caveat that security solutions have to make the distinction between legitimate and malicious behaviors, possibly resulting in some medium/low-confidence detection.
A second less obvious attack surface is the import resolution itself which, from the shellcode’s perspective, relies on external process data. Within this import resolution routine, we observed how the shellcode relied on the BaseDllName
property to generate a hash for each module.

Buffer
and MaximumLength
to hash a module’s BaseDllName
.While the module’s exports were UTF-8 NULL
-terminated strings, the BaseDllName
property was a UNICODE_STRING
structure. This structure contains multiple properties:
typedef struct _UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} UNICODE_STRING, *PUNICODE_STRING;
Length
: The length, in bytes, of the string stored inBuffer
.
MaximumLength
: The length, in bytes, ofBuffer
.
Buffer
: Pointer to a buffer used to contain a string of wide characters.[…]
If the string is null-terminated,
Length
does not include the trailing null character.The
docs.microsoft.comMaximumLength
is used to indicate the length of Buffer so that if the string is passed to a conversion routine such asRtlAnsiStringToUnicodeString
the returned string does not exceed the buffer size.
While not explicitly mentioned in the above documentation, we can implicitly understand that the buffer’s MaximumLength
property is unrelated to the actual string’s Length
property. The Unicode string does not need to consume the entire Buffer
, neither is it guaranteed to be NULL
-terminated. Theoretically, the Windows API should only consider the first Length
bytes of the Buffer
for comparison, ignoring any bytes between the Length
and MaximumLength
positions. Increasing a UNICODE_STRING
‘s buffer (Buffer
and MaximumLength
) should not impact functions relying on the stored string.
As the shellcode’s hashing routine relies on the buffer’s MaximumLength
, similar strings within differently-sized buffers will generate different hashes. This flaw in the hashing routine can be leveraged to neutralize potential Metasploit shellcode. From a technical perspective, as security solutions already hook process creation and inject themselves, interfering with the hashing routine without knowledge of its existence or location can be achieved by increasing the BaseDllName
buffer for modules required by Metasploit (e.g.: kernel32.dll
).
This hash-input validation flaw is what we will leverage next as initial vector to cause a Denial of Service as well as an Execution Flow Hijack.
CWE-823: Use of Out-of-range Pointer Offset
The program performs pointer arithmetic on a valid pointer, but it uses an offset that can point outside of the intended range of valid memory locations for the resulting pointer.
cwe.mitre.org
One observation we made earlier is how the shellcode loops modules indefinitely until a matching export is found. As we found a flaw to alter hashes, let us analyze what happens if all hashes fail to match.
While walking the double-linked list could loop indefinitely, the shellcode will actually generate an “Access Violation” error once all modules have been checked. This exception is not generated explicitly by the shellcode but rather occurs as the code doesn’t verify the list’s boundaries. Given that for each item in the list the BaseDllName.Buffer
pointer is loaded from offset 0x28
, an exception will occur once we access the first non-LDR_DATA_TABLE_ENTRY
item in the list. As shown in the figure below, this will be the case once the shellcode loops back to the first PEB_LDR_DATA
structure, at which stage an out-of-bounds read will occur resulting in an invalid pointer being de-referenced.

InMemoryOrderModuleList
double-linked list.Although from a defensive perspective causing a Denial of Service is better than having Metasploit shellcode execute, let’s see how one could further exploit the above flaw to the defender’s advantage.
Abusing CWE-1288 to Hijack the Execution Flow
One module of interest is kernel32.dll
which, as previously analyzed in the “Execution Flow Analysis” section, is the first required module in order to call the LoadLibraryA
function. During the hashing routine, the kernel32.dll
hash is computed to be 0x92af16da
. By applying the above buffer-resize technique, we can ensure the shellcode loops additional modules since the original hashes won’t match. From here, a security solution has a couple of options:
- Our injected security solution’s DLL could be named
kernel32.dll
. While its hashes would match, having two modules namedkernel32.dll
might have unintended consequences on legitimate calls toLoadLibraryA
. - Similarly, as we are already modifying buffers in
LDR_DATA_TABLE_ENTRY
structures, we could easily save the original values of thekernel32.dll
buffer and assign them to our security solution’s injected module. While this would theoretically work, having a second buffer in memory calledkernel32.dll
isn’t a great idea as previously mentioned. - Alternatively, our security solution’s injected module could have a different name, as long as there is a hash-collision with the original hash. This technique won’t impact legitimate calls such as
LoadLibraryA
as these rely on value-based comparisons, as opposed to the shellcode’s hash-based comparisons.
We previously observed how the Metasploit shellcode performed hashing using additions and rotations on ASCII characters (1-byte). As a follow-up on figure 6, the following schema depicts the state of KERNEL32.DLL
‘s hash on the third loop, where the ASCII characters K
and E
overlap. As one might observe, the NULL
character is a direct consequence of performing 1-byte operations on what initially is a Unicode string (2-byte).

To obtain a hash collision, we need to identify changes which we can perform on the initial KERNEL32.DLL
string without altering the resulting hash. The following figure highlights how there is a 6-bit relationship between the first and third ASCII character. By subtracting the second bit of the first character, we can increment the eighth bit (2+6) of the third character without affecting the resulting hash.

While the above collision is not practical (the ASCII or Unicode character 0xC5
is not within the alphanumeric range), we can apply the same principle to identify acceptable relationships. The following Python snippet brute-forces the relationships among Unicode characters for the KERNEL32.DLL
string assuming we don’t alter the string’s length.
name = "KERNEL32.DLL\0"
for i in range(len(name)):
for j in range(len(name)):
# Avoid duplicates
if j <= i:
continue
# Compute right-shift/left-shift relationships
# We shift twice by 13 bits due to Unicode being twice the size of ASCII.
# We perform a modulo of 32 due to the registers being, in our case, 32 bits in size.
relation = ((13*2*(j-i))%32)
if relation > 16:
relation -= 32
# Get close relationships (0, 1, 2 or 3 bit-shifts)
if -3 <= relation <= 3:
print(f"Characters at index {i} and {j:2d} have a relationship of {relation} bits")
# "Characters at index 0 and 5 have a relationship of 2 bits"
# "Characters at index 0 and 11 have a relationship of -2 bits"
# "Characters at index 1 and 6 have a relationship of 2 bits"
# "Characters at index 1 and 12 have a relationship of -2 bits"
# "Characters at index 2 and 7 have a relationship of 2 bits"
# "Characters at index 3 and 8 have a relationship of 2 bits"
# "Characters at index 4 and 9 have a relationship of 2 bits"
# "Characters at index 5 and 10 have a relationship of 2 bits"
# "Characters at index 6 and 11 have a relationship of 2 bits"
# "Characters at index 7 and 12 have a relationship of 2 bits"
As observed above, multiple character pairs can be altered to cause a hash collision. As an example, there is a 2-bit left-shift relation between the characters at Unicode position 0 and 11.
Given a 2-bit left-shift is similar to a multiplication by 4, incrementing the Unicode character at position 0 by any value requires decrementing the character at position 11 by 4 times the same value to keep the Metasploit hash intact. The following Python commands highlight the different possible combinations between these two characters for KERNEL32.DLL
.
# The original hash (0x92af16da)
print(hex(get_hash(upper('KERNEL32.DLL\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 3 requires adding 12 to 'L'
print(hex(get_hash(upper('HERNEL32.DLX\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 2 requires adding 8 to 'L'
print(hex(get_hash(upper('IERNEL32.DLT\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 1 requires adding 4 to 'L'
print(hex(get_hash(upper('JERNEL32.DLP\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Incrementing 'K' by 1 requires substracting 4 from 'L'
print(hex(get_hash(upper('LERNEL32.DLH\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Incrementing 'K' by 2 requires substracting 8 from 'L'
print(hex(get_hash(upper('MERNEL32.DLD\0'.encode('UTF-16-LE')))))
# "0x92af16da"
This hash collision combined with the buffer-resize technique can be chained to ensure our custom DLL gets evaluated as KERNEL32.DLL
in the hashing routine. From here, if we export a LoadLibraryA
function, the Metasploit import resolution will incorrectly call our implementation resulting in an execution flow hijack. This hijack can be leveraged to signal the security solution about a high-confidence Metasploit import resolution taking place.
Building a Proof of Concept
To demonstrate our theory, let’s build a proof-of-concept DLL which will, once loaded, make use of CWE-1288 to simulate how an EDR (Endpoint Detection and Response) solution could detect Metasploit without prior knowledge of its in-memory location. As we want to exploit the above hash collisions, our DLL will be named hernel32.dlx
.
The proof of concept has been published on NVISO’s GitHub repository.
The Process Injection
To simulate how a security solution would be injected into most processes, let’s build a simple function which will run our DLL into a process of our choosing.
The Inject
function will trick the targeted process into loading a specific DLL (our hernel32.dlx
) and execute its DllMain
function from where we’ll trigger the buffer-resizing. While multiple techniques exist, we will simply write our DLL’s path into the target process and create a remote thread calling LoadLibraryA
. This remote thread will then load our DLL as if the target process intended to do it.
METASPLOP_API
void
Inject(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow)
{
#pragma EXPORT
int PID;
HMODULE hKernel32;
FARPROC fLoadLibraryA;
HANDLE hProcess;
LPVOID lpInject;
// Recover the current module path
char payload[MAX_PATH];
int size;
if ((size = GetModuleFileNameA(hPayload, payload, MAX_PATH)) == NULL)
{
MessageBoxError("Unable to get module file name.");
return;
}
// Recover LoadLibraryA
hKernel32 = GetModuleHandle(L"Kernel32");
if (hKernel32 == NULL)
{
MessageBoxError("Unable to get a handle to Kernel32.");
return;
}
fLoadLibraryA = GetProcAddress(hKernel32, "LoadLibraryA");
if (fLoadLibraryA == NULL)
{
MessageBoxError("Unable to get LoadLibraryA address.");
return;
}
// Open the processes
PID = std::stoi(lpszCmdLine);
hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);
if (!hProcess)
{
char message[200];
if (sprintf_s(message, 200, "Unable to open process %d.", PID) > 0)
{
MessageBoxError(message);
}
return;
}
// Allocated memory for the injection
lpInject = VirtualAllocEx(hProcess, NULL, size + 1, MEM_COMMIT, PAGE_READWRITE);
if (lpInject)
{
wchar_t buffer[100];
wsprintfW(buffer, L"You are about to execute the injected library in process %d.", PID);
if (WriteProcessMemory(hProcess, lpInject, payload, size + 1, NULL) && IDCANCEL != MessageBox(NULL, buffer, L"NVISO Mock AV", MB_ICONINFORMATION | MB_OKCANCEL))
{
CreateRemoteThread(hProcess, NULL, NULL, (LPTHREAD_START_ROUTINE)fLoadLibraryA, lpInject, NULL, NULL);
}
else
{
VirtualFreeEx(hProcess, lpInject, NULL, MEM_RELEASE);
}
}
else
{
char message[200];
if (sprintf_s(message, 200, "Unable to allocate %d bytes.", size+1) > 0)
{
MessageBoxError(message);
}
}
CloseHandle(hProcess);
return;
}
As one might notice, the above code relies on the hPayload
variable. This variable will be defined in the DllMain
function as we aim to get the current DLL’s module regardless of its name, whereas GetModuleHandleA
would require us to hard-code the hernel32.dlx
name.
HMODULE hPayload;
BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
hPayload = hModule;
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
With our Inject
method exported, we can now proceed to build the logic needed to trigger CWE-1288.
The Buffer-Resizing
Resizing the BaseDllName
buffer from the kernel32.dll
module can be accomplished using the logic below. Similar to the shellcode’s technique, we will recover the PEB
, walk the InMemoryOrderModuleList
and once the KERNEL32.DLL
module is found, increase its buffer by 1.
void
Metasplop() {
PPEB pPeb = NULL;
PPEB_LDR_DATA pLdrData = NULL;
PLIST_ENTRY pHeadEntry = NULL;
PLIST_ENTRY pEntry = NULL;
PLDR_DATA_TABLE_ENTRY pLdrEntry = NULL;
USHORT MaximumLength = NULL;
// Read the PEB from the current process
if ((pPeb = GetCurrentPebProcess()) == NULL) {
MessageBoxError("GetPebCurrentProcess failed.");
return;
}
// Get the InMemoryOrderModuleList
pLdrData = pPeb->Ldr;
pHeadEntry = &pLdrData->InMemoryOrderModuleList;
// Loop the modules
for (pEntry = pHeadEntry->Flink; pEntry != pHeadEntry; pEntry = pEntry->Flink) {
pLdrEntry = CONTAINING_RECORD(pEntry, LDR_DATA_TABLE_ENTRY, InMemoryOrderModuleList);
// Skip modules which aren't kernel32.dll
if (lstrcmpiW(pLdrEntry->BaseDllName.Buffer, L"KERNEL32.DLL")) continue;
// Compute the new maximum length
MaximumLength = pLdrEntry->BaseDllName.MaximumLength + 1;
// Create a new increased buffer
wchar_t* NewBuffer = new wchar_t[MaximumLength];
wcscpy_s(NewBuffer, MaximumLength, pLdrEntry->BaseDllName.Buffer);
// Update the BaseDllName
pLdrEntry->BaseDllName.Buffer = NewBuffer;
pLdrEntry->BaseDllName.MaximumLength = MaximumLength;
break;
}
return;
}
This logic is best triggered as soon as possible once injection occurred. While this could be done through a TLS hook, we will for simplicity update the existing DllMain
function to invoke Metasplop
on DLL_PROCESS_ATTACH
.
HMODULE hPayload;
BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
hPayload = hModule;
Metasplop();
break;
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
The Signal
As the shellcode we analyzed relied on LoadLibraryA
, let’s build an implementation which will simply raise the Metasploit alert and then terminate the current malicious process. The following function will only be triggered by the shellcode and is itself never called from within our DLL.
_Ret_maybenull_
HMODULE
WINAPI
LoadLibraryA(_In_ LPCSTR lpLibFileName)
{
#pragma EXPORT
// Raise the error message
char buffer[200];
if (sprintf_s(buffer, 200, "The process %d has attempted to load \"%s\" through LoadLibraryA using Metasploit's dynamic import resolution.\n", GetCurrentProcessId(), lpLibFileName) > 0)
{
MessageBoxError(buffer);
}
// Exit the process
ExitProcess(-1);
}
The above approach can be performed for other variations such as LoadLibraryW
, LoadLibraryExA
and others.
The Result
With our emulated security solution ready, we can proceed to demonstrate our technique. As such, we’ll start by executing Shellcode.exe
, a simple shellcode loader (show on the left in figure 24). This shellcode loader mentions its process ID (which we’ll target for injection) and then waits for the shellcode path it needs to execute.
Once we know in which process the shellcode will run, we can inject our emulated security solution (shown on the right in figure 24). This process is typically performed by the security solution for each process and is merely done manually in our PoC for simplicity. Using our custom DLL, we can inject into the desired process using the following command where the path to hernel32.dlx
and the process ID have been picked accordingly.
# rundll32.exe <dll_path>,Inject <target_pid>
rundll32.exe C:\path\to\hernel32.dlx,Inject 6780

Once the injection is performed, the Shellcode.exe
process has been staged (module buffer resized, colliding DLL loaded) for exploitation of the CWE-1288 weakness should any Metasploit shellcode run. It is worth noting that at this stage, no shellcode has been loaded nor has there been any memory allocation for it. This ensures we comply with the assumption that we don’t know where shellcode is executing.
With our mock security solution injected, we can proceed to provide the path to our initially generated shellcode (shellcode.vir
in our case) to the soon-to-be malicious Shellcode.exe
process (left in figure 25).

Once the shellcode runs, we can see how in figure 26 our LoadLibraryA
signalling function gets called, resulting in a high-confidence detection of shellcode-based import resolution.

Disclosure
As a matter of courtesy, NVISO delayed the publishing of this blog post to provide Rapid7, the maintainers of Metasploit, with sufficient review time.
Conclusion
This blog post highlighted the anatomy of Metasploit shellcode with an additional focus on the dynamic import resolution. Within this dynamic import resolution we further identified two weaknesses, one of which can be leveraged to identify runtime Metasploit shellcode with high confidence.
At NVISO, we are always looking at ways to improve our detection mechanisms. Understanding how Metasploit works is one part of the bigger picture and as a result of this research, we were able to build Yara rules identifying Metasploit payloads by fingerprinting both import hashes and average distances between them. A subset of these rules is available upon request.
4 thoughts on “Anatomy and Disruption of Metasploit Shellcode”