NVISO recently monitored a targeted campaign against one of its customers in the financial sector. The attempt was spotted at its earliest stage following an employee’s report concerning a suspicious email. While no harm was done, we commonly identify any related indicators to ensure additional monitoring of the actor.
The reported email was an application for one of the company’s public job offers and attempted to deliver a malicious document. What caught our attention, besides leveraging an actual job offer, was the presence of execution-guardrails in the malicious document. Analysis of the document uncovered the intention to persist a Cobalt Strike stager through Component Object Model Hijacking.
During my free time I enjoy analyzing samples NVISO spots in-the-wild, and hence further dissected the Cobalt Strike DLL payload. This blog post will cover the payload’s anatomy, design choices and highlight ways to reduce both log footprint and time-to-shellcode.
Execution Flow Analysis
To understand how the malicious code works we have to analyze its behavior from start to end. In this section, we will cover the following flows:
- The initial execution through
DllMain
. - The sending of encrypted shellcode into a named pipe by
WriteBufferToPipe
. - The pipe reading, shellcode decryption and execution through
PipeDecryptExec
.
As previously mentioned, the malicious document’s DLL payload was intended to be used as a COM in-process server. With this knowledge, we can already expect some known entry points to be exposed by the DLL.

While technically the malicious execution can occur in any of the 8 functions, malicious code commonly resides in the DllMain
function given, besides TLS callbacks, it is the function most likely to execute.
docs.microsoft.com/en-us/windows/win32/dlls/dllmain
DllMain
: An optional entry point into a dynamic-link library (DLL). When the system starts or terminates a process or thread, it calls the entry-point function for each loaded DLL using the first thread of the process. The system also calls the entry-point function for a DLL when it is loaded or unloaded using theLoadLibrary
andFreeLibrary
functions.
Throughout the following analysis functions and variables have been renamed to reflect their usage and improve clarity.
The DllMain
Entry Point
As can be seen in the following capture, the DllMain
function simply executes another function by creating a new thread. This threaded function we named DllMainThread
is executed without any additional arguments being provided to it.

DllMain
.Analyzing the DllMainThread
function uncovers it is an additional wrapper towards what we will discover is the malicious payload’s decryption and execution function (called DecryptBufferAndExec
in the capture).

DllMainThread
.By going one level deeper, we can see the start of the malicious logic. Analysts experienced with Cobalt Strike will recognize the well-known MSSE-%d-server
pattern.

DecryptBufferAndExec
.A couple of things occur in the above code:
- The sample starts by retrieving the tick count through
GetTickCount
and then divides it by0x26AA
. While obtaining a tick count is often a time measurement, the next operation solely uses the divided tick as a random number. - The sample then proceeds to call a wrapper around an implementation of the
sprintf
function. Its role is to format a string into thePipeName
buffer. As can be observed, the formatted string will be\\.\pipe\MSSE-%d-server
where%d
will be the result computed in the previous division (e.g.:\\.\pipe\MSSE-1234-server
). This pipe’s format is a well-documented Cobalt Strike indicator of compromise. - With the pipe’s name defined in a global variable, the malicious code creates a new thread to run
WriteBufferToPipeThread
. This function will be the next one we will analyze. - Finally, while the new thread is running, the code jumps to the
PipeDecryptExec
routine.
So far, we had a linear execution from our DllMain
entry point until the DecryptBufferAndExec
function. We could graph the flow as follows:

DllMain
until DecryptBufferAndExec
.As we can see, two threads are now going to run concurrently. Let’s focus ourselves on the one writing into the pipe (WriteBufferToPipeThread
) followed by its reading counterpart (PipeDecryptExec
) afterwards.
The WriteBufferToPipe
Thread
The thread writing into the generated pipe is launched from DecryptBufferAndExec
without any additional arguments. By entering into the WriteBufferToPipeThread
function, we can observe it is a simple wrapper to WriteBufferToPipe
except it furthermore passes the following arguments recovered from a global Payload
variable (pointed to by the pPayload
pointer):
- The size of the shellcode, stored at offset
0x4
. - A pointer to a buffer containing the encrypted shellcode, stored at offset
0x14
.

WriteBufferToPipeThread
.Within the WriteBufferToPipe
function we can notice the code starts by creating a new pipe. The pipe’s name is recovered from the PipeName
global variable which, if you remember, was previously populated by the sprintf
function. The code creates a single instance, outbound pipe (PIPE_ACCESS_OUTBOUND
) by calling CreateNamedPipeA
and then connects to it using the ConnectNamedPipe
call.

WriteBufferToPipe
‘s named pipe creation.If the connection was successful, the WriteBufferToPipe
function proceeds to loop the WriteFile
call as long as there are bytes of the shellcode to be written into the pipe.

WriteBufferToPipe
writing to the pipe.One important detail worth noting is that once the shellcode is written into the pipe, the previously opened handle to the pipe is closed through CloseHandle
. This indicates that the pipe’s sole purpose was to transfer the encrypted shellcode.
Once the WriteBufferToPipe
function is completed, the thread terminates. Overall the execution flow was quite simple and can be graphed as follows:

WriteBufferToPipe
.The PipeDecryptExec
Flow
As a quick refresher, the PipeDecryptExec
flow was executed immediately after the creation of the WriteBufferToPipe
thread. The first task performed by PipeDecryptExec
is to allocate a memory region to receive shellcode to be transmitted through the named pipe. To do so, a call to malloc
is performed with as argument the shellcode size stored at offset 0x4
of the global Payload
variable.
Once the buffer allocation is completed, the code sleeps for 1024 milliseconds (0x400
) and calls FillBufferFromPipe
with both buffer location and buffer size as argument. Should the FillBufferFromPipe
call fail by returning FALSE
(0
), the code loops again to the Sleep
call and attempts the operation again until it succeeds. These Sleep
calls and loops are required as the multi-threaded sample has to wait for the shellcode being written into the pipe.
Once the shellcode is written to the allocated buffer, PipeDecryptExec
will finally launch the decryption and execution through XorDecodeAndCreateThread
.

PipeDecryptExec
.To transfer the encrypted shellcode from the pipe into the allocated buffer, FillBufferFromPipe
opens the pipe in read-only mode (GENERIC_READ
) using CreateFileA
. As was done for the pipe’s creation, the name is retrieved from the global PipeName
variable. If accessing the pipe fails, the function proceeds to return FALSE
(0
), resulting in the above described Sleep
and retry loop.

FillBufferFromPipe
‘s pipe access.Once the pipe opened in read-only mode, the FillBufferFromPipe
function proceeds to copy over the shellcode until the allocated buffer is filled using ReadFile
. Once the buffer filled, the handle to the named pipe is closed through CloseHandle
and FillBufferFromPipe
returns TRUE
(1
).

FillBufferFromPipe
copying data.Once FillBufferFromPipe
has successfully completed, the named pipe has completed its task and the encrypted shellcode has been moved from one memory region to another.
Back in the caller PipeDecryptExec
function, once the FillBufferFromPipe
call returns TRUE
the XorDecodeAndCreateThread
function gets called with the following parameters:
- The buffer containing the copied shellcode.
- The length of the shellcode, stored at the global
Payload
variable’s offset0x4
. - The symmetric XOR decryption key, stored at the global
Payload
variable’s offset0x8
.
Once invoked, the XorDecodeAndCreateThread
function starts by allocating yet another memory region using VirtualAlloc
. The allocated region has read/write permissions (PAGE_READWRITE
) but is not executable. By not making a region writable and executable at the same time, the sample possibly attempts to evade security solutions which only look for PAGE_EXECUTE_READWRITE
regions.
Once the region is allocated, the function loops over the shellcode buffer and decrypts each byte using a simple xor
operation into the newly allocated region.

XorDecodeAndCreateThread
.When the decryption is complete, the GetModuleHandleAndGetProcAddressToArg
function is called. Its role is to place pointers to two valuable functions into memory: GetModuleHandleA
and GetProcAddress
. These functions should enable the shellcode to further resolve additional procedures without relying on them being imported. Before storing these pointers, the GetModuleHandleAndGetProcAddressToArg
function first ensures a specific value is not FALSE
(0
). Surprisingly enough, this value stored in a global variable (here called zero
) is always FALSE
, resulting in the pointers never being stored.

GetModuleHandleAndGetProcAddressToArg
.Back in the caller function, XorDecodeAndCreateThread
changes the shellcode’s memory region to be executable (PAGE_EXECUTE_READ
) using VirtualProtect
and finally creates a new thread. This final thread starts at the JumpToParameter
function which acts as a simple wrapper to the shellcode, provided as argument.

JumpToParameter
.From here, the previously encrypted Cobalt Strike shellcode stager executes to resolve WinINet procedures, download the final beacon and execute it. We will not cover the shellcode’s analysis in this post as it would deserve a post of its own.
While this last flow contained more branches and logic, the overall graph remains quite simple:

PipeDecryptExec
until the shellcode.Memory Flow Analysis
What was the most surprising throughout the above analysis was the presence of a well-known named pipe. Pipes can be used as a defense evasion mechanism by decrypting the shellcode at pipe exit or for inter-process communications; but in our case it merely acted as a memcpy
to move encrypted shellcode from the DLL into another buffer.

So why would this overhead be implemented? As pointed out by another colleague, the answer lays in the Artifact Kit, a Cobalt Strike dependency:
Cobalt Strike uses the Artifact Kit to generate its executables and DLLs. The Artifact Kit is a source code framework to build executables and DLLs that evade some anti-virus products. […] One of the techniques [see:
cobaltstrike.com/help-artifact-kitsrc-common/bypass-pipe.c
in the Artifact Kit] generates executables and DLLs that serve shellcode to themselves over a named pipe. If an anti-virus sandbox does not emulate named pipes, it will not find the known bad shellcode.
As we can see in the above diagram, the staging of the encrypted shellcode in the malloc
buffer generates a lot of overhead supposedly for evasion. These operations could be avoided should XorDecodeAndCreateThread
instead directly read from the initial encrypted shellcode as outlined in the next diagram. Avoiding the usage of named pipes will furthermore remove the need for looped Sleep
calls as the data would be readily available.

It seems we found a way to reduce the time-to-shellcode; but do popular anti-virus solutions actually get tricked by the named pipe?
Patching the Execution Flow
To test that theory, let’s improve the malicious execution flow. For starters we could skip the useless pipe-related calls and have the DllMainThread
function call PipeDecryptExec
directly, bypassing pipe creation and writing. How the assembly-level patching is performed is beyond this blog post’s scope as we are just interested in the flow’s abstraction.

DllMainThread
.The PipeDecryptExec
function will also require patching to skip malloc
allocation, pipe reading and ensure it provides XorDecodeAndCreateThread
with the DLL’s encrypted shellcode instead of the now-nonexistent duplicated region.

PipeDecryptExec
.With our execution flow patched, we can furthermore zero-out any unused instructions should these be used by security solutions as a detection base.
When the patches are applied, we end up with a linear and shorter path until shellcode execution. The following graph focuses on this patched path and does not include the leaves beneath WriteBufferToPipeThread.

As we also figured out how the shellcode is encrypted (we have the xor
key), we modified both samples to redact the actual C2 as it can be used to identify our targeted customer.
To ensure the shellcode did not rely on any bypassed calls, we spun up a quick Python HTTPS server and made sure the redacted domain resolved to 127.0.0.1
. We then can invoke both the original and patched DLL through rundll32.exe
and observe how the shellcode still attempts to retrieve the Cobalt Strike beacon, proving our patches did not affect the shellcode. The exported StartW
function we invoke is a simple wrapper around the Sleep
call.

Anti-Virus Review
So do named pipes actually work as a defense evasion mechanism? While there are efficient ways to measure our patches’ impact (e.g.: comparing across multiple sandbox solutions), VirusTotal does offer a quick primary assessment. As such, we submitted the following versions with redacted C2 to VirusTotal:
wpdshext.dll.custom.vir
which is the redacted Cobalt Strike DLL.wpdshext.dll.custom.patched.vir
which is our patched and redacted Cobalt Strike DLL without named pipes.
As the original Cobalt Strike contains identifiable patterns (the named pipe), we would expect the patched version to have a lower detection ratio, although the Artifact Kit would disagree.


As we expected, the named-pipe overhead leveraged by Cobalt Strike actually turned out to act as a detection base. As can be seen in the above captures, while the original version (left) obtained only 17 detections, the patched version (right) obtained one less for a total of 16 detections. Among the thrown-off solutions we noticed ESET and Sophos did not manage to detect the pipe-less version, whereas ZoneAlarm couldn’t identify the original version.
One notable observation is that an intermediary patch where the flow is adapted but unused code is not zeroed-out turned out to be the most detected version with a total of 20 hits. This higher detection rate occurs as this patch allows pipe-unaware anti-virus vendors to also locate the shellcode while pipe-related operation signatures are still applicable.

While these tests focused on the default Cobalt Strike behavior against the absence of named pipes, one might argue that a customized named pipe pattern would have had the best results. Although we did not think of this variant during the initial tests, we submitted a version with altered pipe names (NVISO-RULES-%d
instead of MSSE-%d-server
) the day after and obtained 18 detections. As a comparison, our two other samples had their detection rate increase to 30+ over night. We however have to consider the possibility that these 18 detections are influenced by the initial shellcode being burned.
Conclusion
Reversing the malicious Cobalt Strike DLL turned out to be more interesting than expected. Overall, we noticed the presence of noisy operations whose usage weren’t a functional requirement and even turn out to act as a detection base. To confirm our hypothesis, we patched the execution flow and observed how our simplified version still reaches out to the C2 server with a lowered (almost unaltered) detection rate.
So why does it matter?
The Blue
First and foremost, this payload analysis highlights a common Cobalt Strike DLL pattern allowing us to further fine-tune detection rules. While this stager was the first DLL analyzed, we did take a look at other Cobalt Strike formats such as default beacons and those leveraging a malleable C2, both as Dynamic Link Libraries and Portable Executables. Surprisingly enough, all formats shared this commonly documented MSSE-%d-server
pipe name and a quick search for open-source detection rules showed how little it is being hunted for.
The Red
Besides being helpful for NVISO’s defensive operations, this research further comforts our offensive team in their choice of leveraging custom-built delivery mechanisms; even more so following the design choices we documented. The usage of named pipes in operations targeting mature environments is more likely to raise red flags and so far does not seem to provide any evasive advantage without alteration in the generation pattern at least.
To the next actor targeting our customers: I am looking forward to modifying your samples and test the effectiveness of altered pipe names.
8 thoughts on “Anatomy of Cobalt Strike’s DLL Stager”