Skip navigation

Duo Security is now a part of Cisco

Join us at the Cisco Partner Summit, Nov 13-15th in Las Vegas

Anatomy of Twitter Bots: Fake Followers

We recently presented a technical research paper at Black Hat USA 2018 called Don’t @ Me: Hunting Twitter Bots at Scale. This paper provides an in-depth look at the entire process of gathering a large Twitter dataset and using a practical data science approach to identify automated accounts within that dataset.

In our paper, we built a classifier to detect automated Twitter accounts in a generic way. During the course of this research, we identified various types of bots that serve different purposes. These include:

  • Content-Generating Bots - Bots that actively generate new content, such as spam or links to malicious content
  • Amplification Bots - Bots that exist to like and retweet content in order to artificially inflate the tweet’s popularity
  • Fake Followers - A type of amplification bot; fake followers exist to follow users in order to make those users appear more popular than they really are.

Each of these types of bots exhibit unique behavior that makes them worth covering in-depth separately. In this post, we’ll explore how fake followers operate, showing how to find an initial list of fake followers and then using this initial list to uncover a larger botnet measuring at least 12,000 accounts.

Seeing the Forest for the Trees

When examining accounts, more information is better. The more activity and information we have for an account, the more accurate of a categorization we can make. Traditional fake followers are challenging to detect on an individual level since they have very little (if any) activity other than following accounts:

Bot Account

It’s difficult to say that this account is a fake follower, since a lack of activity doesn’t mean an account is malicious. It’s perfectly reasonable to assume that legitimate users create accounts just to follow other users, treating Twitter like a newsfeed.

Instead of looking at fake followers on an individual level, we need to take a different approach, looking at their social network as a whole. This approach should help us to differentiate between low activity, non-malicious users and fake followers.

Mapping Suspicious Patterns

To find fake followers, we can start with an assumption: fake followers operate in groups. Intuitively, this makes sense, since followers aren’t usually purchased individually. Instead, they are purchased and applied as a group of accounts, all of which likely share the same characteristics since they were created by the same bot owner.

So what characteristics should we look for? In 2014, The New York Times published an article called “The Follower Factory” that explored the economy of fake followers. This article demonstrates clear patterns that emerge when comparing the followers for a legitimate account against when those followers were created.

Here’s what this mapping looks like for a user account with a low number of presumed fake followers (the author’s own Twitter account):

Low Presumed Fake Followers

The x-axis represents the order in which account started following our target account, and the y-axis represents the date on which the account was created. The chart above shows an expected diversity in the account age of followers. There are no clear patterns of followers that were all created at the same time.

Compare this with the followers for a different user:

Thousands of Fake Followers

In this case, we see a group of thousands of followers at the top right of the graph that were all created at the same time. These accounts then followed the user one after another, which is unlikely to occur under normal circumstances, so these accounts would be suspected to be fake followers.

It’s important to note that just having fake followers isn’t proof that they were purchased and used intentionally. It’s possible that the bot operator directed the accounts to follow innocent accounts to evade detection or as an attempt at harassment, which is why we don’t reveal the identity of the user in this post.

Our technical paper presented at Black Hat USA 2018 included a case study detailing the discovery of a large botnet actively spreading a cryptocurrency giveaway scam. The bots in this botnet used multiple techniques to evade detection, including spoofing legitimate well-known accounts. In multiple cases, the accounts used to broadcast the spam appeared to be accounts of legitimate users that had been hijacked and repurposed.

Fake Elom Nusk

After our initial research was concluded, the botnet began using fake followers to trick victims into believing their spoofed accounts were legitimate. This large influx of fake followers is clearly seen when mapping out the followers for the scam account:

Elom Followers

Following the Thread

Browsing through the accounts following the fake Elon Musk profile revealed that they shared similar characteristics:

Fake Elom Followers

Each of the followers has a description that appears to be a proverb or fortune. Searching for these descriptions suggests that they may have been pulled from this list on Github.

Now, remember that the botnet owner aims to create fake accounts that bypass spam detection. One metric for determining the quality of a bot is how complete the profile is.

Creating random display names and screen names is straightforward. Creating a large number of unique, believable descriptions is much harder. This use of a precompiled list of fortunes appears to be the bot owner’s way of making the profiles more complete with believable profile descriptions.

However, sometimes attempts to blend into the background actually make the bots stand out.

Since each bot has a description from a known list of possible values, we can identify these bots from otherwise legitimate followers with a high degree of accuracy. Granted, similar to how we mentioned earlier that just following users doesn’t make an account malicious, having a fortune in the account description also isn’t indicative of maliciousness.

In this case, we’re able to say these are fake followers because we’re studying the accounts as a network and seeing the similar accounts act in a coordinated way.

Once we’ve identified a small group of fake followers, we can start mapping out their social networks looking for other fake followers with similar characteristics. This will result in the unraveling of the botnet.

We started with a one-degree crawl of a single fake follower:

Single Fake Follower

A one-degree crawl means that we’re fetching the social network for the fake follower and the social network for each account the fake follower is following. The code we open-sourced as part of our initial research includes a script, crawl_network.py which crawls the social network for an account and outputs the results as both compressed JSON as well as in GEXF format for graphing.

We can start the crawl like this:

python crawl_network -g AkgunNasim.gexf -r AkgunNasim.json.gz AkgunNasim

The GEXF output includes both the fake followers as well as legitimate followers. To make our graph cleaner and quicker to layout, we wrote a simple script to identify and parse out the fake followers by searching for which accounts have a description that appears in the list of proverbs. This resulted in a list of nearly 10,000 bots.

After trimming the graph to only the fake followers and the accounts they follow, we can visualize the graph using the Force Atlas 2 layout in Gephi.

Gephi Graphi Visualization

The graph above shows the relationships between the fake followers (black nodes) and the legitimate accounts they follow (green nodes). Many legitimate accounts have the same bots following them, resulting in the highly-connected cluster in the bottom right. In other cases, we see legitimate accounts that have bots unique to them, which result in the fan-like networks towards the top of the graph.

This is a great start! By starting with a single fake follower, we were able to find thousands with the same characteristics, but this isn’t the full story.

We can assume that not every bot in the botnet will follow the same users. This means that there may be entire groups of fake followers that don’t follow any users our initial bot did.

This means that our initial crawl likely didn’t find all the bots in this botnet. To find new bots, we can simply take another bot found during our initial crawl and crawl its network, looking for fake followers we haven’t already discovered. We ran this crawl for another fake follower, resulting in another 1,200 bots found.

To try and fully map out the entire botnet would require us to crawl the network of every fake follower we come across. Unfortunately, this is where Twitter’s API limits make this infeasible.

As we detailed in our initial report, the API endpoints used to fetch the social network, friends/ids (API link) and followers/ids (API link) are both rate limited to 15 requests per rate-limit window. This essentially allows us to make one request per minute. To map the full scope of this botnet, we would have to get the social network for both every fake follower (to discover new legitimate accounts) and for every account that each legitimate account is connected to (in order to discover new bots).

Doing some basic estimation, since each fake follower is connected to around 100 accounts, this would take nearly two years to complete. This also doesn’t include the time it takes to crawl any new bots we find in the process.

But Wait, There’s More

In the previous sections, we showed how graphing an account’s followers revealed interesting patterns that indicate a group of fake followers. Large groups of fake followers make very clear patterns that are easy to see. However, patterns created by smaller groups of fake followers may not be as obvious. To accurately find these smaller groups, we can take a programmatic approach.

Detecting Fake Followers Programmatically

In order to find smaller clusters of potential fake followers, we first started by determining the different instances when multiple accounts created on the same day followed our target account consecutively. Then, we computed the length of these instances which is the number of accounts that followed our target account in a row.

For example, if seven accounts created on August 31st, 2018 followed our target account, one after the other, the length for that instance would be seven. Since it is plausible for legitmate users with accounts created on the same day to follow a given user consecutively, we computed the standard deviation of each length, multiplied it by three, and used that as our threshold for determining potential groups of fake followers.

While the distribution of the lengths may be skewed, using three standard deviations should still filter out the potential lengths of what could be considered suspicious. The code for finding these groups can be found here.

Using the followers from a well-known journalist, we can point out these potential small clusters of fake followers. From first glance, there appears to be a large group of potential fake followers around the 75,000 index.

Journalist Fake Followers

However, our script found five other clusters that would have been more difficult to find through visual inspection. These clusters of potential fake followers can be found below.

Clusters of Fake Followers

Now that we’ve identified these potential groups of fake followers, we can follow the same approach as we did earlier: crawling their social networks and doing further analysis to better understand how these potential groups may be connected.

Conclusion

Social networks allow communities to be built, ideas to be shared and people to connect to one another. By artificially inflating a user’s popularity, fake followers introduce dishonesty into the platform.

This post shows that looking at social networks as a whole allows us to find fake followers. After finding an initial set of bots, connections can be mapped out, revealing the larger botnet.

For more information on how to gather a large Twitter dataset and find bots within that dataset, be sure to check out our research paper Don’t @ Me: Hunting Twitter Bots at Scale.

Olabode Anise

Olabode Anise

Data Scientist

@justsayo

Olabode is a Data Scientist at Duo Security where he wrangles data, prototypes data-related features, and makes pretty graphs to support engineering, product management, and marketing efforts. Prior to Duo, Olabode studied usable security at the University of Florida. When he’s not at work, he spends his time exploring data involving topics such as sports analytics, relative wages and cost of living across the United States.

Jordan Wright

Jordan Wright

Principal R&D Engineer

@jw_sec

Jordan Wright is Principal R&D Engineer at Duo Security as a part of the Duo Labs team. He has experience on both the offensive and defensive side of infosec. He enjoys contributing to open-source software and performing security research.