The Suricata detection engine supports rules written in the embeddable scripting language Lua. In this post we give a PoC Lua script to detect PDF documents with name obfuscation.
One of the elements that make up a PDF, is a name. A name is a reserved word that starts with character / followed by alphanumerical characters. Example: /JavaScript. The presence of the name /JavaScript is an indication that the PDF contains scripts (written in JavaScript).
The PDF specification allows for the substitution of alphanumerical characters in a name by an hexadecimal representation: /J#61vaScript. #61 is the hexadecimal representation of letter a. We call the use of this hexadecimal representation in names “name obfuscation”, because it is a simple technique to evade detection by engines that just look for the normal, unobfuscated name (/JavaScript).
There is no limit to the number of characters in a name that can be replaced by their hexadecimal representation. That makes it impossible to write a Suricata/Snort rule (using content, pcre, …) that will detect all possible obfuscations of the name /JavaScript. However it is easy to write a program that normalizes obfuscated names (pdfid does this for example).
Fortunately Suricata supports the programming language Lua for some time now. Let’s take a look how we can use this to detect PDF files that contain the obfuscated name /JavaScript (FYI: all PDF files we observed with obfuscated /JavaScript name were malicious, so it’s a good test to detect malicious PDFs).
A Suricata Lua script has to implement 2 functions: init and match.
The init function declares the data we need from Suricata to be able to do our analysis. For the PDF document, we need the HTTP response body:
function init(args) return {["http.response_body"] = tostring(true)} end
The match function needs to contain the actual logic to analyze the payload. We need to retrieve the HTTP response body, analyze it, and return 1 if we detect something. When nothing is detected, we need to return 0.
In this example of the match function, we detect if the HTTP response body is equal to string test:
function match(args) a = tostring(args["http.response_body"]) if a == "test" then return 1 else return 0 end end
To detect obfuscated /JavaScript names we use this code:
tBlacklisted = {["/JavaScript"] = true} function PDFCheckName(sInput) for sMatchedName in sInput:gmatch("/[a-zA-Z0-9_#]+") do if sMatchedName:find("#") then local sNormalizedName = sMatchedName:gsub("#[a-fA-F0-9][a-fA-F0-9]", function(hex) return string.char(tonumber(hex:sub(2), 16)) end) if tBlacklisted[sNormalizedName] then return 1 end end end return 0 end
Function PDFCheckName takes a string as input (sInput) and then starts to search for names in the input:
sInput:gmatch("/[a-zA-Z0-9_#]+")
For each name we find, we check if it contains a # character (e.g. if it could be obfuscated):
if sMatchedName:find("#") then
When this is the case, we try to normalize the name (replace #hexadecimal with corresponding ANSI code):
local sNormalizedName = sMatchedName:gsub("#[a-fA-F0-9][a-fA-F0-9]", function(hex) return string.char(tonumber(hex:sub(2), 16)) end)
And finally, we check if the normalized name is in our blacklist:
if tBlacklisted[sNormalizedName] then return 1 end
In that case we return 1. And otherwise we return 0.
The complete Lua script:
tBlacklisted = {["/JavaScript"] = true} function PDFCheckName(sInput) for sMatchedName in sInput:gmatch"/[a-zA-Z0-9_#]+" do if sMatchedName:find("#") then local sNormalizedName = sMatchedName:gsub("#[a-fA-F0-9][a-fA-F0-9]", function(hex) return string.char(tonumber(hex:sub(2), 16)) end) if tBlacklisted[sNormalizedName] then return 1 end end end return 0 end function init(args) return {["http.response_body"] = tostring(true)} end function match(args) return PDFCheckName(tostring(args["http.response_body"])) end
To get Suricata to run our Lua script, we need to copy it in the rules directory and add a rule to call the script, like this:
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"NVISO PDF file lua"; flow:established,to_client; luajit:pdfcheckname.lua; classtype:policy-violation; sid:1000000; rev:1;)
Rule option luajit allows us to specify the Lua script we want to execute (pdfcheckname.lua).
That’s all there is to do to get this running.
But on production systems, we will quickly get into trouble because of performance issues. The rule that we wrote will get the Lua script to execute on all HTTP traffic with incoming data. To avoid this, it is best to add pre-conditions to the rule so that the program will only run on downloaded PDF files:
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"NVISO PDF file lua"; flow:established,to_client; file_data; content:"%PDF-"; within:5; luajit:pdfcheckname.lua; classtype:policy-violation; sid:1000000; rev:1;)
This updated rule checks that the file starts with %PDF- (that’s an easy trick to detect a PDF file, but be aware that there are ways to bypass this simple detection).
For some environments, checking all downloaded PDF files might still cause performance problems. This updated rule uses a regular expression to check if the downloaded PDF file contains a (potentially) obfuscated name:
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"NVISO PDF file lua"; flow:established,to_client; file_data; content:"%PDF-"; within:5; pcre:"/\/.{0,10}#[a-f0-9][a-f0-9]/i"; luajit:pdfcheckname.lua; classtype:policy-violation; sid:1000000; rev:1;)
Note that in the regular expression of this rule we expect that the name is not longer than 11 characters (that’s the case with the name we want to detect, /JavaScript). So if you add your own names to the blacklist, and they are longer than 11 characters, then update the regular expression in the rule.
Conclusion
Support for Lua in Suricata makes it possible to develop complex analysis methods that would not be possible with simple rules, however performance needs to be taken into account.
In part 2 of this blog post, we will provide some tips to help with the development and testing of Lua scripts for Suricata.
As you see attempting to write a pattern based signature (eg., Snort rule/Suricata rule) may leave scope for bypassing that rule with sophisticated techniques. To accurately cover the vulnerability in Snort we have shared object rules, do you think with the help of lua scripting in Suricata can we do all what a shared object rule can do in Snort? What about the performance with luajit scripts.