At NVISO Labs, we are constantly trying to find better ways of understanding the data our analysts are looking at. This ranges from our SOC analysts looking at millions of collected data points per day all the way to the malware analyst tearing apart a malware sample and trying to make sense of its behaviour.
In this context, our work often involves investigating raw network traffic logs. Analysing these often takes a lot of time: a 1MB network traffic capture (PCAP) can easily contain several hundred different packets (and most are much larger!). Going through these network events in traditional tools such as Wireshark is extremely valuable; however they are not always the best to quickly understand from a higher level what is actually going on.
In this blog post we want to perform a series of experiments to try and improve our understanding of captured network traffic as intuitively as possible, by exclusively using interactive visualisations.
For this blog post we will use this simple 112 packet PCAP to experiment with novel ways of visualising and understanding our network data. Let’s go!
Experiment 1 – Visualising network traffic using graph nodes
As a first step, we simply represent all IP packets in our PCAP as unconnected graph nodes. Each dot in the visualisation represents the source of a packet. A packet being sent from source A to destination B is visualised as the dot visually traveling from A to B. This simple principle is highlighted below. For our experiments, the time dimension is normalised: each packet traveling from A to B is visualised in the order they took place, but we don’t distinguish the duration between packets for now.
This visualisation already allows us to quickly see and understand a few things:
- We quickly see which IP addresses are most actively communicating with each other (172.29.0.111 and 172.29.0.255)
- It’s quickly visible which hosts account for the “bulk” of the traffic
- We see how interactions between systems change as time moves on.
A few shortcomings of this first experiment include:
- We have no clue on the actual content of the network communication (is this DNS? HTTP? Something else?)
- IP addresses are made to be processed by a computer, not by a human; adding additional context to make them easier to classify by a human analyst would definitely help.
Experiment 2 – Adding context to IP address information
By using basic information to enrich the nodes in our graph, we can aggregate all LAN traffic into a single node. This lets us quickly see which external systems our LAN hosts are communicating with:
By doing this, we have improved our understanding of the data in a few ways:
- We can very quickly see which traffic is leaving our LAN, and which external (internet facing) systems are involved in the communication.
- All the internal LAN traffic is now represented as 1 node in our graph; this can be interesting in case our analyst wants to quickly check which network segments are involved in the communication.
However, we still face a few shortcomings:
- We still don’t really have a clue on the actual content of the network communication (is this DNS? HTTP? Something else?)
- We don’t know much about the external systems that are being contacted.
Experiment 3 – Isolating specific types of traffic
By applying simple visual filters to our simulation (just like we do in tools like Wireshark), we can make a selection of the packets we want to investigate. The benefit of doing this is that we can easily focus on the type of traffic we want to investigate without being burdened with things we don’t care about at that point in time of the analysis.
In the example below, we have isolated DNS traffic; we quickly see that the PCAP contains communication between hosts in our LAN (remember, the LAN dot now represents traffic from multiple hosts!) and 2 different DNS servers.
Once we notice that the rogue DNS server is being contacted by a LAN host, we can change our visualisation to see which domain name is being queried by which server. We also conveniently attached the context tag “Suspicious DNS server” to host 126.96.36.199 (the result of our previous experiment). The result is illustrated below. It also shows that we are not only limited by showing relations between IP addresses; we can for example illustrate the link between the DNS server and the hosts they query.
Even with larger network captures, we can use this technique to quickly visualise connectivity to suspicious systems. In the example below, we can quickly see that the bulk of all DNS traffic is being sent to the trusted corporate DNS server, whereas a single hosts is interacting with the suspicious DNS server we identified before.
So what’s next?
Nothing keeps us from entirely changing the relation between two nodes; typically, we are used to visualising packets as going from IP address A to IP address B; however, more exotic visualisations are possible (think about the relations between user-agents and domains, query lengths and DNS requests, etc.); in addition, there is plenty of opportunity to add more context to our graph nodes (link an IP address to a geographical location, correlate domain names with Indicators of Compromise, use whitelists and blacklists to more quickly distinguish baseline vs. other traffic, etc.). These are topics we want to further explore.
Going forward, we plan on continuing to share our experiments and insights with the community around this topic! Depending on where this takes us, we plan on releasing part of a more complete toolset to the community, too.
About the author
Daan Raman is in charge of NVISO Labs, the research arm of NVISO. Together with the team, he drives initiatives around innovation to ensure we stay on top of our game; innovating the things we do, the technology we use and the way we work form an essential part of this. Daan doesn’t like to write about himself in third-person. You can contact him at firstname.lastname@example.org or find Daan online on Twitter and LinkedIn.