In May 2018, when ESET published a blog post covering PDFs with 2 zero days, our interest was immediately piqued. Promptly after our analysis of these PDFs, we send out an early warning to our customers.
Now that Microsoft published a blog post with the detailed analysis of the zero days, we find it appropriate to explain how to quickly extract the payloads of these 2 PDFs.
We updated this blog post with a video.
Analysis
We will start with PDF (MD5 e6b7392fb03ff9ff069a9ec5d4221641), this one contains an Adobe Reader exploit (CVE-2018-4990), but no Windows exploit.
With pdfid.py, we start the analysis process:
This tells us the PDF contains JavaScript and an OpenAction. These are often found together: the OpenAction triggers the execution of the JavaScript upon opening of the PDF document.
Using pdf-parser.py, we can search and extract the JavaScript from the PDF document:
The script is contained in object 1 (/JS 1 0 R):
It is compressed (/FlateDecode), we can decompress it with option -f:
The script starts with a long array of hexadecimal values: this is often the payload in malicious PDFs.
To decode it, we will use base64dump.py. This is a tool to decode embedded payloads, not only base64 encodings, but several other encodings like the hexadecimal encoding used in this script.
With option -w for pdf-parser, we extract the raw script (e.g. without escaped whitespace characters).
With option -w and -i , for base64dump, we ignore all whitespace characters and commas. This will create a long string for the payload: 0x81ec8b550x0002d0ec0x57… This encoding is called zx (zero-x, e.g. 0x). It can be little-endian (le) or big-endian (be). With JavaScript in Adobe Reader, it is little-endian: we use option -e zxle to decode this long string. And finally, to limit the amount of strings detected and decoded by base64dump, we want the decoded string to be at least 10 bytes long: -n 10.
The first decoded string is very long (110690 encoded bytes). With YARA rule contains_pe_file, we can quickly check if it contains a PE-file (Windows executable):
No surprise, it does! Now with base64dump, we can extract this payload (and the PE file). With option -s 1, we select the first decoded string and show an hex/ascii dump (-a):
This is not the PE file, but shellcode to load the PE file directly into memory.
PE files start with MZ, we can use option -c (cut) to cut-out the part of the decoded string that starts with MZ like this:
This is still not the embedded PE-file, let’s search for the second instance of MZ, like this:
This looks like a PE-file. Let’s dump it (-d) and pass it to pecheck.py, a tool to analyze PE-files:
It’s indeed a PE-file, more precisely, a 32-bit DLL:
This shows how it is possible to triage and analyze zero-day malicious PDFs with static analysis tools. The same one-liner can be used to analyze the second PDF (MD5 bd23ad33accef14684d42c32769092a0), containing a 32-bit EXE with zero-day privilege escalation (CVE-2018-8120).
This one contains an interesting IOC, under debug info:
e:\code\2018\EOP-32-pdf\EoP_1\call_as_shellcode\SetImeInfoPoc\Release\SetImeInfoPoc.pdb
In May 2018, we performed a retro-hunt on VirusTotal Intelligence with this IOC, but identified no new samples.
Conclusion
The goal of this post is to teach you how to triage and analyze malicious PDF documents. With static analysis tools, even sophisticated zero-day PDFs can quickly reveal their payload, sometimes with very distinct IOCs.
Microsoft and Adobe have patched these zero-days.
Update
We created a video with step-by-step instructions to analyze these malicious PDFs:
If you are interested in receiving our advisories via our mailing list, you can subscribe by sending us an e-mail at csirt@nviso.be.
Want to learn more? Please do join us at the upcoming BruCON training on malicious documents, which was authored by NVISO’s experts!
About the authors
Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.
This is going to make the number of terms which you need to give the work part in the best way so just follow the work part and try to know the adobe reader day of pdf to which is having the document in the best way.