Cyber Security Challenge Belgium 2015 – Solving the Data Extraction challenge

This is the second blog post in the Cyber Security Challenge Belgium 2015 (CSCBE) solutions series. This time, we’re taking a look at the Data Extraction challenge.

Data Extraction

The challenge

The following challenge description was given to the students:

We messed up and contacted the wrong forensic department. They say they found data, but we can’t really make anything out of it. Can you?

The students were also given the following image:

The challenge was designed to test the students’ out of the box thinking capabilities, as well as their ability to research a certain subject.

Analyzing the image

Steganography is the art of hiding information inside another file. This can take on many different forms. Images are often used as a container for other kinds of information and the possibilities are endless. Steganography challenges are often part of a CTF and of course, we had to create one too. These challenges are usually one of the more feared challenges, as the number of possible approaches is literally endless.

There are two major categories for hiding information inside an image. Either you modify the internals of the image to add some extra data, or you visually encrypt your information and just add it to the image where everyone can see it. Because the image appears to contain a certain pattern, it is most likely that the second approach was taken.

Four different kind of shapes were used to create the pattern and every shape has its own color. If you’ve payed attention in your high school biology class, you may recognize the shapes. When you learned about the birds and the bees, you should have also learned about deoxyribonucleic acid, which is just a fancy way of saying DNA.

Research

DNA consists of two biopolymer strands coiled around each other, forming the well known double helix:

Image taken from classroom.synonym.com

Each strand consists of many different nucleotides which lock together in a certain way. There are two base pairs: Adenine (A) matches with Thymine (T) and Guanine (G) matches with Cytosine (C).

Image taken from thinglink.com

The colors of the different nucleotides may be different depending on which textbook you use, but the shapes are usually the same. That means we can convert our image into a string of letters consisting of A, T, G and C using a little bit of python:

This is already looking good, but we can’t really see anything that resembles a flag.

Digging a little deeper

 If we do some more research, we can find that different nucleotides are combined into amino acids when they are processed by your cells. Each triplet is combined into one of the amino acids according to the following diagram:

Image taken from commons.wikimedia.org
For example, AAA will be transformed into Lysine (K) and UCA will be transformed into Serine (S). 
We could write a python script that translates these nucleotides into amino acids, but it’s even easier to let the internet do it for us. The Bioinformatics Resource Portal hosts a tool that will do the hard work for us. If we enter our nucleotide string into the tool, we get the following:

This tool also automatically solves the last hurdle of the challenge: If you start combining from the first nucleotide, you only get a bunch of random letters. However, if you start combining from the second nucleotide, you can read “THE PASS IS METAPHYSIC LIGHTYEARS”, which is the solution to this challenge.

Statistics

At the end of the qualifiers, only five teams were able to solve this challenge. Apparently many teams recognized the image as being a DNA sequence and they successfully transformed and combined the nucleotides into amino acids. Only a handful of teams got past the last hurdle of starting with the correct nucleotide. 

Final thoughts

Not a lot of teams managed to solve this challenge. As steganography can be done in a million ways, it’s not always easy to see a path towards the solution. This can discourage teams and have them invest their time in more practical challenges that have a more well defined scope. Choosing which challenges you invest your time in is an important decision in any CTF and apparently most teams did not prefer to invest their time in this one.

3 thoughts on “Cyber Security Challenge Belgium 2015 – Solving the Data Extraction challenge

  1. What a great challenge! I love the fact that you utilized DNA!

    I completely agree, it is a challenge that is not only difficult, but one that almost all people have the tendency to lose interest in. However, the problem is when these techniques, are utilized for malicious purposes.

Leave a Reply