What problem does Malware Discoverer address?

URL redirection is a salient feature for phishing and malware sites. Abusers use redirection to control the information flow and to evade detection. Collecting the redirection trace and discovering final URLs that host malicious artifacts is thus not an easy task. One challenge is how to discover entry points, or domains that initiate redirections. Another challenge is how to counteract cloaking techniques including IP-ban, javascript execution and fast flux. Malware Discoverer is designed to handle those challenges.

Example of one discovered malware campaigns. The entry-level domains (leftmost) use fake news as bait to lure users to click.

How does Malware Discoverer work?

Malware Discoverer is powered by an unsupervised discovery system that is able to trace coordinated redirection campaigns. The algorithm includes three components:

  1. A crawler to collect redirection paths from a seed of domains
  2. A cluster to identify suspicious domains that share common redirection paths
  3. A search expander to discover more domains co-hosted with suspicious domains

Malware Discoverer is a fully automated system. After initial data crawling, it calls a python program to load the data, calculate summary statistics, and generate redirection network graphs (see image above as an example). The system auto-generaet a daily threat intelligence report, which is published on this website and sent to subscribers via email.

How is Malware Discoverer seeded?

Currently our system tracks active domains from five IPs everyday. One IP belongs to a domain parking company, the other four IPs belong to so-called “bullet-proof” hosting providers.

To learn more about how we identified those five IPs, see our report malware campaigns that distribute suspicious chrome extensions. If you want to track other IP, please contact us.

What do we analyze in the threat intelligence report?

Our reports focus on the coordinated redirection behavior of those malware campaigns. We breakdown domains and IPs into three categories: tier one are entry-level domains/IPs, tier two are intermediate redirection hops, and ther three are final landing domains/IPs. For each tier, the report covers:

Current data collections

Date IP Status Result
5/29 37.48.65.149 Finished link
5/29 46.166.182.112 Finished link
5/29 64.32.8.70 Finished link
5/29 103.224.182.207 Finished link
5/29 207.244.67.215 Finished link
5/31 37.48.65.149 Finished link
5/31 46.166.182.112 Finished link
5/31 64.32.8.70 Finished link
5/31 103.224.182.207 Finished link
5/31 207.244.67.215 Finished link
6/01 37.48.65.149 Running link
6/01 46.166.182.112 Running link
6/01 64.32.8.70 Running link
6/01 207.244.67.215 Running link
6/02 37.48.65.149 Finished link
6/02 46.166.182.112 Finished link
6/02 64.32.8.70 Finished link
6/02 103.224.182.207 Finished link
6/02 207.244.67.215 Finished link
6/03 37.48.65.149 Finished link
6/03 46.166.182.112 Finished link
6/03 64.32.8.70 Finished link
6/03 103.224.182.207 Finished link
6/03 207.244.67.215 Finished link

How can those threat intelligence benefit me?

Our detection is not designed to be comprehensive. Because first, we are not tracking all IP addresses and domains, and second, even if we do, there are malicious domains that never redirect. Nevertheless, we still believe that Malware Discoverer is a valuable threat intelligence tool – we find that only less than 1% of domains we discovered are labelled by Google Safe Browsing to be malicious. We hope that by sharing our method and data, we can receive more constructive feedback from the community, and together make malware detection more efficient.

We encourage you to take a look at our reports and graphs. If you find them helpful, connect us and we will share you the daily threat intelligence report.


Have other ideas? / Want to subscribe to get threat intelligence report? / Contact

Zhouhan Chen, NYU Center for Data Science, zc1245@nyu.edu, Personal Website