CyLab team uses AI to find and disrupt malware distribution networks

Daniel Tkacik

Aug 29, 2019

What do botnets, spam campaigns, and distributed denial of service attacks all have in common? Yes, they’re all used maliciously to victimize people’s computers, but they’re also all made possible by something called a malware distribution network (MDN), a set of unassuming web domains used to spread malware.

Detecting MDNs is tricky, because they’re highly dynamic and complex, requiring cumbersome inspection of network traffic logs. But according to a new study authored by researchers in Carnegie Mellon University’s CyLab, network analysts have a new ally: artificial intelligence (AI).

“Artificial intelligence can learn from analysts and help them organize the data in a visual and digestible manner,” says Yang Cai, a senior systems scientist in CyLab.

Artificial intelligence can learn from analysts and help them organize the data in a visual and digestible manner.
Yang Cai, Senior Systems Scientist, CyLab

Cai recently presented their study, “Perceiving Behavior of Cyber Malware with Human-Machine Teaming” at the Applied Human Factors and Ergonomics conference in Washington D.C. on July 27, 2019.

In the study, the researchers outline a unique system of discovering MDNs involving cooperation between human analysts and machines. While such human-machine models have been developed before, Cai says theirs is a bit different.

“Compared to previous human-machine models, ours requires fewer training data for the machines,” says Cai.

Compared to previous human-machine models, ours requires fewer training data for the machines.
Yang Cai, Senior Systems Scientist, CyLab

Cai’s team trained their system on data from a known MDN acquired from Google Safe Browsing, a blacklist service that provides lists of URLs that contain malware and other unsafe web resources such as phishing sites.

To train their system, first the AI crawled through the network traffic data to build a malware distribution graph. Then, humans and machine vision algorithms iterated back and forth to identify patterns, clusters, and abnormalities in the graph. The algorithms learned from the humans, so each iteration of map inspection improved until the humans and machines agreed with each other.

These algorithms, having learned from humans, could then be used to quickly discover unknown MDNs and related features. For example, on the same dataset that the algorithms were trained on, they were able to discover previously unknown MDN subnetworks and other features, such as “bridges” that link different MDNs to one another.

“This system can help the analysts grasp a big picture of MDNs and their evolutionary dynamics,” says Cai. “With a big picture, they can develop strategies to disrupt or stop the spread of malware.”

Other authors on the study included Software Engineering Institute researchers Jose Morales and William Casey, Northrop Grumman human-machine teaming chief technologist Neta Ezer, and CyLab researcher Sihan Wang.