Skip navigation
Duo Labs

Don’t @ Me: Hunting Twitter Bots at Scale

Social networks allow people to connect with one another, share ideas, and have healthy conversations. Recently, automated Twitter accounts, or “bots,” have been making headlines for their effectiveness at spreading spam and malware, as well as influencing this online discussion.

At our talk, Don't @ Me: Hunting Twitter Bots at Scale at Black Hat USA 2018, we are excited to release the results of a three-month long research project identifying Twitter bots at a large scale.

To accompany the conference talk, we are releasing a technical paper that details:

  • How we gathered the dataset
  • Our scientific approach to data analyzation
  • How we built a classifier to identify bots
  • How we identified botnets, including a spam-spreading botnet customer story

It's important to note that in this paper, we specifically looked for automated accounts, not necessarily malicious automated accounts. Distinguishing benign automation from malicious automation is a topic for future work.

In order to allow everyone to make use of our work as easily as possible, we’re open-sourcing our data collection code, which you can find here: https://github.com/duo-labs/twitterbots. The full code will be released after our talk on Wednesday, August 8.

Key Findings

Botnet Relationships Botnet Account Relationships, Following/Followers & Likes

In the technical paper released today, Don’t @ Me: Hunting Twitter Bots at Scale, we detail the following key findings:

  • Using knowledge of how Twitter generates user IDs, we gathered a dataset of 88 million public Twitter profiles consisting of standard account information represented in the Twitter API, such as screen name, tweet count, followers/following counts, avatar and description.
  • As API limits allow, this dataset was enriched with both the tweets posted by accounts, as well as with targeted social network information (follower/following) information.
  • Practical data science techniques can be applied to create a classifier that is effective at finding automated Twitter accounts, also known as “bots.”
  • A customer story detailing a large botnet of at least 15,000 bots spreading a cryptocurrency scam. By monitoring the botnet over time, we discover ways the bots evolve to evade detection.
  • Our cryptobot scam customer story demonstrates that, after finding initial bots using the tools and techniques described in this paper, a thread can be followed that can result in the discovery and unraveling of an entire botnet. For this botnet, we use targeted social network analysis to reveal a unique three-tiered hierarchical structure.

This paper provides an in-depth description of the entire process for finding Twitter bots, from gathering the data to performing the analysis. To help enable the community of researchers to build on our work, we provide a narrative to our research, explaining why we chose various approaches. We then include a section at the end of the paper that highlights different techniques we tried that didn’t yield the expected results for the purposes of providing transparent research.

Research Focus / Motivations

Many of us on Duo Labs use Twitter as a way to connect to the infosec industry. We were familiar with automated Twitter accounts, and had read previous academic papers covering both techniques on building a dataset of Twitter accounts as well as using various techniques to identify automated accounts from a previously shared dataset.

However, we hadn’t come across a work that attempted to tell the entire story by providing detailed techniques on how to both build datasets; identifying initial bots within that dataset; and using those bots to uncover an organized botnet. We wanted to show that practical, straightforward approaches can still be used to effectively identify automated accounts with a high degree of accuracy.

In addition to this, we believe that there is an incredibly talented community of security researchers interested in the topic of how bots operate on social networks. We wanted to open-source the code used during this research to make it easy to get involved with bot identification, enabling this community to build on and improve our work.

Conclusion

During the course of this research, Twitter announced that they are taking more proactive action against both automated spam and malicious content by identifying and challenging “more than 9.9 million potentially spammy or automated accounts per week.” In a follow-up blog post, Twitter also described their plans to remove accounts that had been previously locked due to suspicious activity from follower counts.

We’re excited to see these efforts by Twitter and are hopeful that these increased investments will be effective in combating spam and malicious content, however, we don’t consider the problem solved. The customer story presented in this paper demonstrates that organized botnets are still active and can be discovered with relatively straightforward analysis.

By open-sourcing the tools and techniques developed during this research, we hope to enable researchers to continue building on our work, creating new techniques to identify and flag malicious bots, and helping to keep Twitter and other social networks a place for healthy online discussion and community.