Detecting the sudden appearance of events with ee-outliers and Elasticsearch

Recently, for our open-sourced ee-outliers framework, we released a new outlier model capable of detecting the sudden appearance of one or multiple field values of an Elasticsearch event. For example, this model could spot new TLDs that are suddenly being contacted (DNS/SSL) and communicating with C2 domains. It could also detect an executable that suddenly runs from a previously unseen location or with an unseen name (ATT&CK T1218 or T1036).

A few other examples where detecting sudden appearance could also be used:

  • Sudden login of a new user on a computer.
  • Sudden change of operating system (display) language, keyboard settings and/or region/country.
  • Sudden outbound traffic from a process that has never had outbound traffic (a common example is dllhost.exe suddenly contacting a C2).
  • Sudden relocation of explorer.exe execution. That should trigger the following adversary technique: https://twitter.com/CyberRaiju/status/1273597319322058752

In the following sections, we will do a simple demonstration of how you can use ee-outliers with the new sudden appearance model. If this is the first time you will be using ee-outliers, we strongly suggest you to read through the documentation on GitHub.

Let’s get started!

Preparing the data

For this demo, we decided to construct a use case capable of identifying that an executable is suddenly running with another name. This type of behavior can be detected by observing the sysmon log events. Our data is continuously collected from more than 50 workstations with more than 500,000 Sysmon events with ID 1 a week.

To trigger that use case on one of our workstations running Windows 10, we will take the process powershell.exe located in C:\Windows\System32\WindowsPowerShell\v1.0\, rename it catchme.exe and run it. We can also run catchme.exe from another location.

Building the detection use case

For the use case specified in the previous section, we define a sudden_appearance model in the following ee-outliers configuration file:

##############################
# SUDDEN APPEARANCE - RENAMED PROCESS
##############################
[sudden_appearance_winlog_renamed_process]
es_query_filter=_exists_:winlog.event_id AND winlog.event_id:1

aggregator=winlog.event_data.Description.keyword
target=process.name

history_window_days=7
history_window_hours=0

# Size of the sliding window defined in DDD:HH:MM
# Therefore, 20:13:20 will correspond to 20 days 13 hours and 20 minutes
sliding_window_size=03:00:00

sliding_window_step_size=00:01:00

outlier_type=first observation
outlier_reason=sudden appearance of renamed process
outlier_summary=sudden appearance of process renamed {process.executable} with description {winlog.event_data.Description}

run_model=1
test_model=0

The sudden_appearance model looks for outliers by searching for the sudden appearance of a new field value. To summarize the use case defined above, it will scan through 7 days of sysmon events with ID 1, create buckets by process description and finally look in each bucket for the sudden appearance of a new process name that has never been seen for a period of minimum 3 days minus 1 hour. In more details, the use case can be explained as follows:

  • The es_query_filter selects all the Elasticsearch events that contain the process information we want. In this case, it will just select events that have an event_id=1. It also excludes all the events that don’t have event_id defined.
  • The aggregator will create buckets of events per process description. The event’s field corresponding to the process description is winlog.event_data.Description.keyword. Note that when you rename a process, the process description in sysmon events will remain the same. We select that field as aggregator because we want to catch processes that keep their process description but have their process name suddenly changed.
  • For each bucket of events, it will detect events where the target field contains a value that was not seen before. In this situation, the scanned target field corresponds to the process name (process.name).
  • The events classified as outliers are guaranteed to have their target field value never seen during a period of time larger than the time defined by the sliding_window_size (3 days) minus the sliding_window_step size (1 hour).
  • The parameters defined in history_window_days (7 days) and history_window_hours (0 hours) determine the time window in the past in which we want to scan for events.

Bonus: How does the model algorithm detect the sudden appearance of events?

For everyone who is interested, in this section, we explain more in-depth how the sudden_appearance model algorithm works. If you just want to try it out or see the results of the experiment feel free to skip to the next section.

Let’s define:

  • The global window, determined by the parameters history_window_days and history_window_hours.
  • The sliding window, where the size is determined by the parameter sliding_window_size. It has to be smaller than the global window.
  • The sliding step, where the size is determined by the parameter sliding_window_step_size. It represents the jump step in time that will be used to slide the sliding window within the global window.

The sudden_appearance model works as follows:

  1. The sliding window is first placed at the beginning of the global window.
  2. An analysis of the sudden appearance of (a) certain field value(s) is processed in the sliding window. More specifically, it will take the first occurrences of each different value corresponding to the field defined by the target parameter. If multiple fields are defined in the target parameter, it will take the first occurrences of each unique combination of values corresponding to the multiples fields. Note that this operation is done independently in each group of aggregation defined by the aggregator parameter. If the first occurrence of a field value appears after the end of the sliding window minus the sliding step, the event corresponding to this first occurrence will be considered as an outlier.
  3. Afterward, the sliding window slides further in the global window, with a time distance defined by the sliding step.
  4. The operations defined in steps 2. and 3. are repeated until the sliding window has gone through the entire global window.

Running ee-outliers

Next, we ran the created use case using ee-outliers over the last 7 days of events in our test environment, featuring around 50 workstations and more than half a million process execution events (the result of the es_query_filter argument we defined in our use case).

We observe in the analysis summary that it caught 139 outliers over 530,131 events in less than 1 minute. This means that +/- 0.0002% of the analyzed events are classified as outliers without even using whitelisting! This low number of outliers sounds manageable to be manually investigated.

Analyzing the results

One option to investigate outliers by hand is to set the log_level parameter of the global configuration file to DEBUG. You will be able to observe the outliers directly from the command prompt and see something similar to the following screenshot:

Scrolling down the outliers’ descriptions, we can find our powershell.exe renamed catchme.exe! Furthermore, most of the other outliers are observed with an aggregation of only a few events (compared to the 2209 events of the catchme.exe outlier) equal to the number of events that suddenly appear. These false-positive outliers can be interpreted as a sudden appearance of a new process instead of the sudden appearance of a renamed process. In other words, if the number of process descriptions is equal to the number process names that suddenly appear, it means that during the analysis, the process name value has stayed the same and that the process had never been renamed.

Conclusion

In this blog post, we demonstrated how the sudden appearance model can be used to detect the sudden appearance of a renamed process. Nevertheless, this technique can be used by blue teams for Threat Hunting or Security Monitoring activities to spot a broad range of anomalous activities!

Although the sudden appearance model is still at his very first use, we can already think about simple improvements that could make it even more powerful. For example, the issue with the observed false positives in the results could be resolved by classifying an event as an outlier only if the number of events contained in aggregations is not equal to the number of events that suddenly appear. Another improvement could be to make the model able to detect a sudden burst of events. This could allow analysts to detect, for example, mass modifications of file extensions (ransomware file modification behavior).
These improvements will be addressed in a next release of ee-outliers and certainly discussed in another blog post! So, stay tuned!

We are impatient to see the community (maybe you) using it and giving some feedback! I hope you enjoyed the article and found it useful. Thanks for reading and don’t hesitate to leave questions or suggestions – here or in the issue tracker on github.

Additional content

If you want to know more about how to use ee-outliers with other examples use-cases, I suggest you to read “TLS beaconing detection using ee-outliers and Elasticsearch”, “Detecting suspicious child processes using ee-outliers and Elasticsearch“ or “Using Word2Vec to spot anomalies while Threat Hunting using ee-outliers“.

About the author

Maximilien Roberti is a Machine Learning engineer working full-time in the NVISO Labs team. He focuses on building new and exciting models to help the blue teams spot adversaries as part of NVISO’s Threat Hunting and Security Monitoring activities. You can get in touch with Maximilien on LinkedIn or Twitter.

Join the Conversation

1 Comment

Leave a comment

Leave a Reply

%d bloggers like this: