How to detect algorithmically generated malicious domains
24 October 2018
Enoch Agyepong, Senior Cyber Security Engineer, Cyber Operations Team, Airbus CyberSecurity, shows us how an approach that examines frequency and character distribution within a domain name can help identify a malicious domain
The evolution of today’s cyber-criminal often sees them deploy despicable tactics in order to gain a foothold within the target. For the vast majority, there is a dependence on Dynamic Domain Name Services (DDNS) to maintain a Command & Control (C&C) presence on the victim’s network. This enables the hacker to initiate, often malicious, file transfers or updates and one of the most difficult and elusive techniques to trace are Domain Generation Algorithms (DGA).
However, there are methods to spot this notoriously known Domain Name Server attack. To understand this, one must understand what constitutes as domain abuse.
Also of interest: Worried about cryptojacking? Here’s what you need to know
Controlling the host
Domains registered for malicious activity such as phishing, malware, botnets all fall in the category of domain abuse and all globally viewed as being illegal. This, of course, does not deter a cyber-criminal, who will continue to exploit the flexibility of domain names for their own monetary gain.
A known tactic would be to move the IP address of their malware C&C servers through a variety of techniques which enables them to go undetected. To maintain control of the compromised host, cyber-criminals will plant a backdoor and implement a C&C channel to keep in constant communication with the device. This means the device has now become a bot which is controlled by the botmaster aka the cyber attacker.
A traditional method for cyber-criminals to establish C&C communication is through centralised topology which occurs by hardcoding the victims IP address or domain name to allow continual access between the victim’s device and attacker.
Although this method can be effective, it is not preventable. Firstly, a security researcher can identify the hardcoded IP address and reverse engineer the malware and blacklist the domain. In addition, if the administrator disables the C&C server or detects the compromised IP address, the “bot” will no longer be in the attackers control.
Determined not to be outdone, cyber attackers have endeavoured to employ more resilient tactics to avoid detection. Decentralised topology was then developed and involved Peer-to-Peer (P2P) botnets to overcome the C&C network limitations.
Devices that had been compromised would no longer need to connect with the central command, thus making it harder for the host IP address to be detected. The difficulty associated with decentralised topology is it’s harder to create, implement and maintain.
Despite this hurdle, the robustness element of P2P infrastructures was still appealing, and so cyber-criminals wanted to combine this with the simplicity of C&C. This led to new strategies being created, the most notable being the implementation of DGA. DGA involves numerous domain names being sourced under a variety of jurisdictions and service providers. The domain names are then frequently changed.
The theory behind this is to reduce the likelihood of detection by security personnel who would otherwise quickly implement an appropriate defence. The thought process behind this is if one or more domains are taken down, the compromised device will be reallocated a new one which will still be controlled by the botmaster via the C&C server. It is thought the first recorded use of DGA for malicious purposes was in 2006 with the Sality malware but it gained notoriety in 2008 with the Kraken, Conficker and Szribi attacks.
Also of interest: What is the biggest threat to Domain Name System security?
Many have offered various strategies to detect compromised domain names in DNS traffic. Through stringent analysis of DNS traffic logs, it was found that malicious hosts under DGA would display signs or characteristics that would be very different from legitimate domain names. Researchers have distinguished that the taxonomy of algorithmically-generated domains consist of fake and unpronounceable words with some featuring numerical digits.
Threat intelligence and machine learning are largely encompassed within DGA detection techniques and can be useful when trying to predict possible outcomes, patterns and events that can alert security teams when searching through traffic logs. Alternatively, security teams can implement a frequency analysis technique that can dissect the character distribution of domain names which, once decrypted, can flag to any secret texts or ciphers.
The success of this detection method is enhanced by the fact that domain names were originally designed to only follow the American Standard Code for Information Interchange (ASCII). Under ASCII, every English character is assigned a number from 0 to 127. Through this, the characters and letters within algorithmically-generated domain names are made distinguishable once they have been translated. Seeing as some letters are more common than others in the English language, the distribution of the characters can be analysed to determine which domains are genuine and which are DGA domains. This solidifying the notion that frequency analysis is a practical method for DGA detection.
There is no doubt that DGA detection is difficult and so explains why its use amongst cybercriminals, mainly malware developers, is so prevalent. Ciphering through the seemingly endless logs of traffic is complex, especially if the current log collection and correlation tools are inadequate. By implementing a machine learning detection solution that encompasses frequency analysis of the characters within a domain name would be advantageous for security teams and organisations to determine the validity of a domain and protect the overall network infrastructure.