Zero to hero YARA rules
In this follow-up to a previous blog I wrote on exploration of threat hunting with Veeam & YARA , in this blog I want to go into detail on how to create, maintain & test YARA rules.
Checkout my previous post here: Threat Hunting with Veeam : Leveraging Yara for Incident Response (mritsurgeon.co.za)
Introduction
to YARA:
Understanding
Yara: Yara is a versatile and indispensable tool in the field of malware
analysis. It is a staple in most cybersecurity professionals' toolboxes. YARA
rules are customizable patterns used for identifying specific malware, targeted
attacks, and security threats tailored to your unique environment.
Antivirus
vs YARA :
The YARA
scanner and rules function similarly to an antivirus scanner and its
signatures, but with a key distinction. YARA is a tool designed for crafting rules
to detect malware, whereas antivirus relies on predefined rulesets for
identifying malicious software.
In the
context of a zero-day virus or malware, which is entirely unknown before
discovery, traditional antivirus signatures may not exist yet. This is
particularly crucial when dealing with polymorphic viruses featuring encrypted
payloads and mutation engines. The encryption conceals the harmful payload from
standard scanners and threat detection software, which depend on recognizing
the virus through its decryption process. Once the virus infiltrates a target
system, its payload is decrypted, triggering the infection. The mutation engine
further complicates detection by generating new decryption routines randomly,
making it harder for the virus to be identified as it spreads to new targets.
Now, how
can we defend against malware or viruses that manage to deceive even the most
robust antivirus products?
YARA provides part of the solution. Much like an antivirus relies on defined signatures for recognized malware, in the case of a zero-day threat where traditional antivirus definitions fall short, we can create our own rules using YARA. This allows us to proactively establish detection rules for previously unknown malware, filling the gap left by traditional antivirus solutions.
Lets start with Deconstructing a YARA Rule:
·
Rule
Name
This is a user-defined name that provides a clear and concise identifier for the rule. It helps distinguish one rule from another. For example:
---------------------------------------------------------------
rule XYZMalwareRule {
---------------------------------------------------------------
Metadata
Metadata in YARA rules contains additional information about the rule.
It typically includes details like the author, description, and any other
relevant information. It provides context for the rule. For example:
---------------------------------------------------------------
author = "Your Name"
description = "Detects a specific malware variant XYZ"
---------------------------------------------------------------
Strings
Strings in YARA rules are the patterns or sequences of characters that
the rule searches for in the target files. These can be simple text strings or
more complex patterns using wildcards or regular expressions. You can leverage
tools Like PE studio & PE viewer & HDX a Hex editor , these are valuable
for analyzing Portable Executable (PE) files, such as Windows executables
(.exe) and dynamic link libraries (.dll). These tools help security researchers,
analysts, and malware experts to inspect and understand the internal structure
of PE files.
o
Text
Strings:
---------------------------------------------------------------
rule ExampleRule {
strings:
$text_string = "malware123"
condition:
$text_string
}
---------------------------------------------------------------
PE Studio of a Text String
Here we can see on analyzing a EXE using PE studio , we find a Unicode Text String with value
"Hit any key to exit..."
So a rule
here to find the Text String would look like:
---------------------------------------------------------------
strings:
$exit_string = "Hit any key to exit..." wide
condition:
$exit_string
}
The wide
modifier indicates that the string is Unicode.
Here is a
Match on the Unicode Text Rule for the EXE we examined.
o Hex Strings
Hex strings allow you to specify byte sequences using hexadecimal notation. This is useful for identifying binary patterns. For example:
---------------------------------------------------------------
$hex_string = { 4D 5A 90 00 }
condition:
$hex_string
}
HDX HEX Editor to find the same Text Unicode string as a Hex String :
Here I’ve identified the HEX of previous example value "Hit any key to exit..."
Let’s Create a YARA Rule specific to this HEX
---------------------------------------------------------------
rule HexUnicodeStringRule {
strings:
$hex_unicode_string = { 48 00
69 00 74 00 20 00 61 00 6E 00 79 00 20 00 6B 00 65 00 79 00 20 00 74 00 6F 00
20 00 65 00 78 00 69 00 74 00 2E 00 }
condition:
$hex_unicode_string
}
---------------------------------------------------------------
Here is a
Match on the Unicode Text Rule for the EXE we examined.
·
Condition
The
condition is the logical expression that must be true for the rule to trigger.
It combines the elements defined in the rule, such as metadata and strings, to
determine if the rule matches a given file. You can use many different
operators like and
, or
, and not
Lets put this all together , For Rule Example :
---------------------------------------------------------------
rule HexUnicode_TextStringRule {
author = "Ian Engelbrecht"
description = "Detects a specific a Unicode text String & a Hex Value for that string"
strings:
$hex_unicode_string = { 48 00 69 00 74 00 20 00 61 00 6E 00 79 00 20 00 6B 00 65 00 79 00 20 00 74 00 6F 00 20 00 65 00 78 00 69 00 74 00 2E 00 }
$exit_string = "Hit any key to exit..." wide
$hex_unicode_string or $exit_string
}
---------------------------------------------------------------
So lets
explain the Final Rule :
meta
section:
Provides
metadata about the rule, including the author and a brief description.
strings
section:
Defines two
strings to be searched in the analyzed files:
$hex_unicode_string:
A hexadecimal sequence representing a Unicode string.
$exit_string:
A wide ASCII string "Hit any key to exit..."
condition
section:
Specifies
the condition for the rule to trigger:
$hex_unicode_string
or $exit_string: The rule triggers if either the hexadecimal Unicode string or
the ASCII "Hit any key to exit..." string is found in the analyzed
file.
Here is a
match on the combined rule , Matching both strings
What About Data
Classification with YARA:
YARA isn't just about hunting threats; it's a versatile tool for data classification you can pinpoint data categorization, ensuring your information remains secure.
I’m going
to Create 2 rules based on the information we just went through above how Yara
rule structure:
First I
saved some fictitious Credit card data into a document , I got the test card
details here :
Test Credit Card Account Numbers (paypalobjects.com)
Here is a screenshot of my Document :
For this we
will use text regex string to identify credit card number lengths & types
Regex,
short for regular expression, is a powerful tool for matching patterns in text.
It's a sequence of characters that forms a search pattern.
There are many of these patterns already that you can use to identify different Types of Card number sequences , so don’t be overwhelmed by the strings.
---------------------------------------------------------------
meta:
author = "Ian Engelbrecht"
description = "Detects test credit card account numbers"
strings:
$amex = /\b37\d{13}\b/
$mastercard = /\b5[1-5]\d{14}\b/
$visa = /\b(4\d{12}(\d{3}))\b/
$dinersclub = /\b(3(0[0-5]|[68][0-9])\d{11})\b/
$discover = /\b((6011\d{12}|65\d{14}))\b/
$jcb = /\b((35\d{14}|2131\d{11}|1800\d{11}))\b/
condition:
1 of them
}
---------------------------------------------------------------
Lets Break
Down this rule :
TestCreditCardNumbers: The name of the YARA rule.
Metadata Section:
meta:
Contains metadata about the rule.
author =
"Ian engelbrecht": Specifies the author of the rule.
description = "Detects test credit card account numbers": Provides a brief description of the rule's purpose.
Strings
Section:
Defines
several regular expressions ( regex) that represent different patterns for
credit card numbers. Each pattern corresponds to a specific credit card type:
$amex_pattern:
American Express
$mastercard_pattern:
MasterCard
$visa_pattern:
Visa
$diners_club_pattern:
Diners Club
$discover_pattern:
Discover
$jcb_pattern: JCB
Condition
Section:
condition:
Specifies the conditions that must be met for the rule to trigger.
1 of
($amex_pattern, $diners_club_pattern,
$discover_pattern, $jcb_pattern, $mastercard_pattern, $visa_pattern): The rule
triggers if at least one of the credit card patterns is found in the analyzed
files.
Here is a
match Based on this rule :
Lets do
another, I have a User document with user information :
I generated
this fake data with ( scary site just by way )
Generate a Random Name - Male, American, United States - Fake Name Generator
Here is a
screenshot of my document:
Ok so lets
generate a Yara rule for this data, as far as PII (Personal Identifiable
Information) is concerned I only really want to find 3 things here , Email ,
Phone Number & social security number.
We will use
Regex for all 3 :
---------------------------------------------------------------
rule
GenericUserDataPatterns {
meta:
author = "Ian Engelbrecht"
description = "Detects generic
patterns for email, phone number, and SSN"
strings:
$email_pattern =
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/
$phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/
$ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/
condition:
all of ($email_pattern, $phone_pattern,
$ssn_pattern)
}
---------------------------------------------------------------
I want my
condition here to be all , meaning document must have Phone number + Email +
SSN for it to flag .
Let look at
the rule :
Rule Name:
GenericUserDataPatterns: The name of the YARA rule.
Metadata
Section:
meta:
Contains metadata about the rule.
author =
"Ian engelbrecht": Specifies the author of the rule.
description = "Detects generic patterns for email, phone number, and SSN": Provides a brief description of the rule's purpose.
Strings
Section:
Defines
three regular expressions:
$email_pattern:
Matches a generic pattern for email addresses.
$phone_pattern:
Matches a generic pattern for phone numbers in the format ###-###-####.
$ssn_pattern: Matches a generic pattern for Social Security Numbers (SSN) in the format ###-##-####.
Condition
Section:
condition:
Specifies the conditions that must be met for the rule to trigger.
all of ($email_pattern, $phone_pattern, $ssn_pattern): The rule triggers if all three patterns are found in the analyzed files.
Here is a
match on all strings under same rule :
How do we
handle Multiple rules: ?
As you can
see as you begin writing rules you might end up with a lot of different rules
to do various things & running each rule individually can be time consuming
, here you have some options.
Firstly you can combine rules into one Yara file IE :
---------------------------------------------------------------
rule
GenericUserDataPatterns {
meta:
author = "Ian Engelbrecht"
description = "Detects generic
patterns for email, phone number, and SSN"
strings:
$email_pattern =
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/
$phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/
$ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/
condition:
all of ($email_pattern, $phone_pattern,
$ssn_pattern)
}
rule
TestCreditCardNumbers {
meta:
author = "Ian Engelbrecht"
description = "Detects test credit
card account numbers"
strings:
$amex4 = /\b37\d{13}\b/
$mastercard = /\b5[1-5]\d{14}\b/
$visa = /\b(4\d{12}(\d{3}))\b/
$dinersclub =
/\b(3(0[0-5]|[68][0-9])\d{11})\b/
$discover =
/\b((6011\d{12}|65\d{14}))\b/
$jcb =
/\b((35\d{14}|2131\d{11}|1800\d{11}))\b/
condition:
1 of them
}
---------------------------------------------------------------
This single
YARA rule file includes both the GenericUserDataPatterns rule for detecting
generic patterns in user data and the TestCreditCardNumbers rule for detecting
specific test credit card numbers. You can use this file for analyzing files
and identifying patterns related to both scenarios under the same scan.
See here
match :
The Second
Option
The problem
with this a approach is your Yara file can be come some what lengthy.
Here we use
Include almost like Nesting both rues into a master rule & then meeting the
conditions of both rules together :
---------------------------------------------------------------
include
"C:\Yara\PVTGenericUserDataPatterns.yar"
include
"C:\Yara\PVTTestCreditCardNumbers.yar"
rule
CombinedRules {
meta:
description = "Master Rule Combining
GenericUserDataPatterns and TestCreditCardNumbers"
condition:
GenericUserDataPatterns and TestCreditCardNumbers
}
---------------------------------------------------------------
To
Accomplish this both GenericUserDataPatterns rule and TestCreditCardNumbers
rule must both have their rules defined as
private rule , or they wont be used together and they will match separately ,
to only use a combined condition so a file must have both GenericUserDataPatterns & TestCreditCardNumbers
the rules must have private definition. If not it will match for either or.
Example of
what I mean by private definition :
---------------------------------------------------------------
private
rule GenericUserDataPatterns {
meta:
author = "Ian Engelbrecht"
description = "Detects generic
patterns for email, phone number, and SSN"
strings:
$email_pattern =
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/
$phone_pattern = /\b\d{3}-\d{3}-\d{4}\b/
$ssn_pattern = /\b\d{3}-\d{2}-\d{4}\b/
condition:
all of ($email_pattern, $phone_pattern,
$ssn_pattern)
}
---------------------------------------------------------------
From YARA
documentation here :
Writing YARA
rules — yara 4.4.0 documentation
All strings
in YARA can be marked as private which means they will never be included in the
output of YARA. They are treated as normal strings everywhere else, so you can
still use them as you wish in the condition
So we Hide
their output and Create a top level rule that will be in output which is :
---------------------------------------------------------------
meta:
description = "Master Rule Combining GenericUserDataPatterns and TestCreditCardNumbers"
condition:
GenericUserDataPatterns and TestCreditCardNumbers
}
---------------------------------------------------------------
Both rules GenericUserDataPatterns & TestCreditCardNumbers are then used as strings in the CombinedRules Rule.
Here is the
match output :
This RTF file has both a type of credit card number and PII ( email , phone number etc )
Automated
Testing with GitHub and YARA-CI:
Now, let's
talk automation. GitHub workflows and YARA-CI bring efficiency to rule testing that ensures your rules are battle-ready when you need them.
What do i mean Battle Ready ? let's make sure we need getting false positives and our rule structures have no error :
The above
rules I created are in a folder on my PC & I'm going to push the .YAR files
into my github Repository where I already have YARA-CI installed.
Installation
| YARA-CI (virustotal.com)
Let’s push
the YAR rules using git on my local machine into my repository.
In Github ,
I can see all my YARA rules have been pushed:
Notice the Error icon , this is an automation task with failure.
Mostly the scan has checked my rules and has
indicated that the Regex that I’m using to find PII & Credit Card data
could slow down the scan.
See here
screenshot.
Further
checks we can see my rules were run against Virus total data set National
Software Reference Library (NSRL) and we can then see if our rule needs
refinement due to false positives or false negatives.
We Can see
some false positives detected via Virus Total Yara CI :
This is
expected as I was just trying to match a against single strings as example that
could exist in other executables, it was in no way unique.
When we
follow the file signature we can see it’s a DLL , that I matched with HEX &
Unicode string , Remember ?
---------------------------------------------------------------
$exit_string = "Hit any key to exit..." wide
---------------------------------------------------------------
Virus total also deems this clean via other security products , again this is expected we did this purely to demonstrate PE & HEX tools to find strings you can then identify with Yara.
So this tool
helps where Theory meet Reality
its then hands-on with the technicalities of testing and configuring YARA rules
against real-world data sets. From VirusTotal this helps to validate your rules against the challenges
of the digital landscape.
The National
Software Reference Library (NSRL) helps as this is known good software files,
so if our rule is matching against this it generally means Yes we have false
positive, this is a great way to Retro Hunt with your rule before you
actually hunt.
So let’s Recap
:
We explored the art of crafting YARA rules for cyber resilience. YARA is a powerful tool in the arsenal of cybersecurity professionals, offering customizable patterns to identify malware, targeted attacks, and security threats tailored to specific environments. We compared YARA to traditional antivirus tools, emphasizing its advantage in detecting zero-day threats where traditional signatures may fall short.
Deconstructing a YARA rule involves defining key elements:
Rule Name:
A clear identifier for the rule.
Metadata:
Additional information about the rule, such as author and description.
Strings:
Patterns or sequences of characters to search for in target files.
Text
Strings: Simple character sequences.
Hex
Strings: Byte sequences in hexadecimal notation.
The condition, a logical expression, determines when the rule triggers. We demonstrated rule creation using examples, showcasing the use of text and hex strings.
Beyond threat hunting, YARA proves versatile for data classification. We created rules to identify credit card numbers and generic user data patterns using regular expressions.
Handling multiple rules efficiently was discussed. Combining rules into a single YARA file or using "include" statements with a master rule provided strategies for managing multiple rule sets.
With a focus on automated testing using GitHub workflows and YARA-CI. We pushed YARA rules to a GitHub repository, and automated testing flagged potential issues, allowing for refinement and validation against real-world datasets like VirusTotal and the National Software Reference Library.
Conclusion:
---------------------------------------------------------------
Thank you for reading if you got this far, please leave a comment or share.
---------------------------------------------------------------
Comments
Post a Comment
Leave your Thoughts & Comments , and I'll reply as soon as possible.
Thank you for you views in Advance.