We recently presented a technical research paper at Black Hat USA 2018 called Don’t @ Me: Hunting Twitter Bots at Scale. This paper provides an in-depth look at the entire process of gathering a large Twitter dataset and using a practical data science approach to identify automated accounts within that dataset.
In our paper, we built a classifier to detect automated Twitter accounts in a generic way. During the course of this research, we identified various types of bots that serve different purposes. These include:
- Content-Generating Bots - Bots that actively generate new content, such as spam or links to malicious content
- Amplification Bots - Bots that exist to like and retweet content in order to artificially inflate the tweet’s popularity
- Fake Followers - A type of amplification bot; fake followers exist to follow users in order to make those users appear more popular than they really are.
Each of these types of bots exhibit unique behavior that makes them worth covering in-depth separately. In this post, we’ll explore how fake followers operate, showing how to find an initial list of fake followers and then using this initial list to uncover a larger botnet measuring at least 12,000 accounts.
Seeing the Forest for the Trees
When examining accounts, more information is better. The more activity and information we have for an account, the more accurate of a categorization we can make. Traditional fake followers are challenging to detect on an individual level since they have very little (if any) activity other than following accounts:
It’s difficult to say that this account is a fake follower, since a lack of activity doesn’t mean an account is malicious. It’s perfectly reasonable to assume that legitimate users create accounts just to follow other users, treating Twitter like a newsfeed.
Instead of looking at fake followers on an individual level, we need to take a different approach, looking at their social network as a whole. This approach should help us to differentiate between low activity, non-malicious users and fake followers.
Mapping Suspicious Patterns
To find fake followers, we can start with an assumption: fake followers operate in groups. Intuitively, this makes sense, since followers aren’t usually purchased individually. Instead, they are purchased and applied as a group of accounts, all of which likely share the same characteristics since they were created by the same bot owner.
So what characteristics should we look for? In 2014, The New York Times published an article called “The Follower Factory” that explored the economy of fake followers. This article demonstrates clear patterns that emerge when comparing the followers for a legitimate account against when those followers were created.
Here’s what this mapping looks like for a user account with a low number of presumed fake followers (the author’s own Twitter account):
The x-axis represents the order in which account started following our target account, and the y-axis represents the date on which the account was created. The chart above shows an expected diversity in the account age of followers. There are no clear patterns of followers that were all created at the same time.
Compare this with the followers for a different user:
In this case, we see a group of thousands of followers at the top right of the graph that were all created at the same time. These accounts then followed the user one after another, which is unlikely to occur under normal circumstances, so these accounts would be suspected to be fake followers.
It’s important to note that just having fake followers isn’t proof that they were purchased and used intentionally. It’s possible that the bot operator directed the accounts to follow innocent accounts to evade detection or as an attempt at harassment, which is why we don’t reveal the identity of the user in this post.
Our technical paper presented at Black Hat USA 2018 included a case study detailing the discovery of a large botnet actively spreading a cryptocurrency giveaway scam. The bots in this botnet used multiple techniques to evade detection, including spoofing legitimate well-known accounts. In multiple cases, the accounts used to broadcast the spam appeared to be accounts of legitimate users that had been hijacked and repurposed.
After our initial research was concluded, the botnet began using fake followers to trick victims into believing their spoofed accounts were legitimate. This large influx of fake followers is clearly seen when mapping out the followers for the scam account:
Following the Thread
Browsing through the accounts following the fake Elon Musk profile revealed that they shared similar characteristics:
Each of the followers has a description that appears to be a proverb or fortune. Searching for these descriptions suggests that they may have been pulled from this list on Github.
Now, remember that the botnet owner aims to create fake accounts that bypass spam detection. One metric for determining the quality of a bot is how complete the profile is.
Creating random display names and screen names is straightforward. Creating a large number of unique, believable descriptions is much harder. This use of a precompiled list of fortunes appears to be the bot owner’s way of making the profiles more complete with believable profile descriptions.
However, sometimes attempts to blend into the background actually make the bots stand out.
Since each bot has a description from a known list of possible values, we can identify these bots from otherwise legitimate followers with a high degree of accuracy. Granted, similar to how we mentioned earlier that just following users doesn’t make an account malicious, having a fortune in the account description also isn’t indicative of maliciousness.
In this case, we’re able to say these are fake followers because we’re studying the accounts as a network and seeing the similar accounts act in a coordinated way.
Once we’ve identified a small group of fake followers, we can start mapping out their social networks looking for other fake followers with similar characteristics. This will result in the unraveling of the botnet.
We started with a one-degree crawl of a single fake follower:
A one-degree crawl means that we’re fetching the social network for the fake follower and the social network for each account the fake follower is following. The code we open-sourced as part of our initial research includes a script,
crawl_network.py which crawls the social network for an account and outputs the results as both compressed JSON as well as in GEXF format for graphing.
We can start the crawl like this:
python crawl_network -g AkgunNasim.gexf -r AkgunNasim.json.gz AkgunNasim
The GEXF output includes both the fake followers as well as legitimate followers. To make our graph cleaner and quicker to layout, we wrote a simple script to identify and parse out the fake followers by searching for which accounts have a description that appears in the list of proverbs. This resulted in a list of nearly 10,000 bots.
After trimming the graph to only the fake followers and the accounts they follow, we can visualize the graph using the Force Atlas 2 layout in Gephi.
The graph above shows the relationships between the fake followers (black nodes) and the legitimate accounts they follow (green nodes). Many legitimate accounts have the same bots following them, resulting in the highly-connected cluster in the bottom right. In other cases, we see legitimate accounts that have bots unique to them, which result in the fan-like networks towards the top of the graph.
This is a great start! By starting with a single fake follower, we were able to find thousands with the same characteristics, but this isn’t the full story.
We can assume that not every bot in the botnet will follow the same users. This means that there may be entire groups of fake followers that don’t follow any users our initial bot did.
This means that our initial crawl likely didn’t find all the bots in this botnet. To find new bots, we can simply take another bot found during our initial crawl and crawl its network, looking for fake followers we haven’t already discovered. We ran this crawl for another fake follower, resulting in another 1,200 bots found.
To try and fully map out the entire botnet would require us to crawl the network of every fake follower we come across. Unfortunately, this is where Twitter’s API limits make this infeasible.
As we detailed in our initial report, the API endpoints used to fetch the social network,
friends/ids (API link) and
followers/ids (API link) are both rate limited to 15 requests per rate-limit window. This essentially allows us to make one request per minute. To map the full scope of this botnet, we would have to get the social network for both every fake follower (to discover new legitimate accounts) and for every account that each legitimate account is connected to (in order to discover new bots).
Doing some basic estimation, since each fake follower is connected to around 100 accounts, this would take nearly two years to complete. This also doesn’t include the time it takes to crawl any new bots we find in the process.
But Wait, There’s More
In the previous sections, we showed how graphing an account’s followers revealed interesting patterns that indicate a group of fake followers. Large groups of fake followers make very clear patterns that are easy to see. However, patterns created by smaller groups of fake followers may not be as obvious. To accurately find these smaller groups, we can take a programmatic approach.
Detecting Fake Followers Programmatically
In order to find smaller clusters of potential fake followers, we first started by determining the different instances when multiple accounts created on the same day followed our target account consecutively. Then, we computed the length of these instances which is the number of accounts that followed our target account in a row.
For example, if seven accounts created on August 31st, 2018 followed our target account, one after the other, the length for that instance would be seven. Since it is plausible for legitmate users with accounts created on the same day to follow a given user consecutively, we computed the standard deviation of each length, multiplied it by three, and used that as our threshold for determining potential groups of fake followers.
While the distribution of the lengths may be skewed, using three standard deviations should still filter out the potential lengths of what could be considered suspicious. The code for finding these groups can be found here.
Using the followers from a well-known journalist, we can point out these potential small clusters of fake followers. From first glance, there appears to be a large group of potential fake followers around the 75,000 index.
However, our script found five other clusters that would have been more difficult to find through visual inspection. These clusters of potential fake followers can be found below.
Now that we’ve identified these potential groups of fake followers, we can follow the same approach as we did earlier: crawling their social networks and doing further analysis to better understand how these potential groups may be connected.
Social networks allow communities to be built, ideas to be shared and people to connect to one another. By artificially inflating a user’s popularity, fake followers introduce dishonesty into the platform.
This post shows that looking at social networks as a whole allows us to find fake followers. After finding an initial set of bots, connections can be mapped out, revealing the larger botnet.
For more information on how to gather a large Twitter dataset and find bots within that dataset, be sure to check out our research paper Don’t @ Me: Hunting Twitter Bots at Scale.