Skip navigation

A Security Analysis of Over 500 Million Usernames and Passwords

We at Duo Labs recently got our hands on the so-called Anti Public Combo List, a dump of 562,077,487 usernames and passwords aggregated from a variety of large-scale data breaches and password dumps. While this means that we can’t say anything about user security behavior in particular contexts, it still provides an uncommonly large view into broad user security choices.

Who Are These Users?

The first question that presents itself when a credential dump lands in your lap is often: who is affected by this breach? We found 8% of the usernames (which are primarily email addresses) appear more than once in the dataset, supporting the idea that this particular dump is, in fact, a collection of individual dumps from separate sources.

We also found that 42% of usernames end in yahoo.com, while 7% end in aol.com, leading us to the conclusion that this is a consumer-heavy dataset, rather than, say, corporate email accounts. The domains with more than 1% representation in the user list is below:

Email Domain Percent of Database
yahoo.com 41.71%
aol.com 7.31%
web.de 2.39%
live.com 2.02%
gmx.de 1.91%
msn.com 1.82%
yahoo.de 1.49%
yahoo.fr 1.42%
yahoo.co.uk 1.32%
aim.com 1.15%
comcast.net 1.12%
lycos.de 1.12%
epost.de 1.11%


Overall, 51% of the user accounts are some sort of yahoo.* or ymail.* accounts. Certainly some corporate email accounts are included. By filtering for domains of the Fortune 1000 companies and manually removing domains that are used for consumer email (like yahoo.com and facebook.com), we found that only about 1 million (1.7%) of the accounts in the dump were from domains of large companies, which reinforces our assessment that this is almost entirely consumer accounts, comprising 98.3% of the dataset

What Do Their Passwords Look Like?

One measure of password strength is the length of a password. This is a very flawed metric for asserting strength, but you can assert weakness with it: a four-character password is easy to brute force, no matter how many special characters you use.

Distribution of Password Length

The set of passwords in this dump follow a nice exponential long-tail distribution in terms of length, peaking at 9 characters at 27%, falling under 1% after 14 characters. The large spike right after 100 occurs, not coincidentally, at 128 bytes, which is the length of a SHA-512 hash in hex.

Upon further inspection, that’s exactly what all of those are: just a bunch of hashes, like fab689475682c7a88be219de0a76f0d6096e487fa0bcdd752048d3aaa76dd9ef47344 b89817434a284d8cb5b0111a2ada7aafcb635570c32149e43b58a990c9d.

Since this appears to be a collection of individual password dumps, it’s likely that the breach in question resulted in the theft of hashes instead of cleartext passwords. When this happens, attackers will try to crack as many passwords as possible, leaving the hashes in place for those they couldn’t quickly crack.

The pitfall of just looking at password length is obvious when considering the password ’refrigerator.’ “After all, it’s a 12-character password! That sounds secure!” Except that all-lower-case letters dramatically reduces the search space, as compared to lowercase, uppercase, numbers and symbols. In this case, it’s an especially bad password, since it’s just a single common dictionary word and would likely be included on a list of common words that an attacker might try before just guessing randomly. One common password restriction is that it must include a number. Either due to users’ adopting stronger security habits or merely due to password requirements, 70% of passwords had at least 1 number. Indeed, the mean number of numbers per password is 2.3.

Uppercase and symbols were not nearly as prevalent, with only 6% and 4% of passwords containing at least one such character, respectively. This lends credence to the argument that it’s merely password requirements that prompt more secure password choices. A surprisingly low result was for the space character, which is allowed by many systems, but was only present in 0.03% of passwords examined.

This suggests that an attacker might be less likely to include space in their set of search characters, and users would be wise to keep in mind that spaces can often be valid password characters when choosing. One easy way to incorporate spaces is by using passphrases: entire phrases that you use as a password, assuming you don’t get stopped by draconian maximum lengths.

The Top 10 Passwords

The top ten passwords contain some fan favorites and aligns closely with other password reports, such as password manager Keeper’s top 10:

Anti Public Keeper
123456 123456
123456789 123456789
abc123 qwerty
password 12345678
password1 111111
12345678 1234567890
111111 1234567
1234567 password
12345 123123
1234567890 987654321


If one had to wager a guess, it looks like 6 characters is the most common minimum password length in modern consumer web applications. That really isn’t enough to reasonably protect your password from someone just trying all the possible passwords (i.e., “brute forcing”). NIST recently wrapped up a comment period on new security standards, which include, “Memorized secrets [i.e., passwords or PINs] [shall] be at least 8 characters in length if chosen by the [user].” These days, something more like 12 characters should be what you aim for as a minimum, since attackers can guess faster as computers get faster.

Ok, So Have I Been Pwned?

Funny you should ask that, as that’s the name of an excellent website that collects breached account data so you can see when and how your username/passwords have been leaked (since, by now, almost everybody’s username/password has been leaked at some point). If you are interested in the source for this particular password dump, Troy Hunt, the creator of HIBP, has posted an analysis of the password dump on his blog.

We recommend that you sign up for the free monitoring option, where you get an email if/when your email address shows up in a newly discovered credential dump. If you are a domain administrator, you can also search for all pwned accounts on your domain.

Kyle Lady

Senior Research and Development Engineer

@kylelady

Kyle is a Senior R&D engineer at Duo, where he harasses everyone for more data to try to satisfy the unquenchable thirst for data that academic research imparted. He has only broken the Internet once.