Security news that informs and inspires

Examining Security Science at Black Hat 2017


Daniel Kahneman, in his 2011 book “Thinking, Fast and Slow,” describes two modes of thought: “System 1,” the fast, emotional, instinctual system, and “System 2,” the slow, rational, deliberative system. System 1 is more efficient at quick decisions, but concepts like information security aren’t incorporated.

In her Black Hat 2017 talk “Ichthyology, Phishing as a Science,” Stripe Security Engineer Carla Burnett (@tetrakazi) uses this model of human cognition to plan, develop, and execute phishing attacks against other users within Stripe. Of course, she’s not out to actually collect credentials, but to train internal users about what clever types of phishing might hit their inboxes. System 2, the logical, observant system, won’t have time to kick in unless a user is already suspicious of the email in question.

Burnett establishes a taxonomy of types of phishing to make discussion clear, something that’s essential for scientific discussion, based on which action she wants the user to take: perform an external action, install an exploit, or hand over credentials (the archetype of phishing). By splitting thought processes into the two system, she’s able to explain surprising results.

For example, Github sends plaintext emails. Burnett sent a phish with the exact same (lack of) styling, and had only 10% conversion (as we steal the term “conversion” from marketing to mean, in this case, the user doing whatever action the phisher desires, agnostic of whether it’s handing over credentials, approving an OAuth application, etc.). By using HTML emails with design elements from, she was able to boost her conversion rate to 50%, even though there aren’t actually any legitimate emails that look like that.

The key was tricking System 1 into feeling good about the email and clicking it before System 2 kicked in. Since current SaaS email conventions include sending substantially styled emails, a plaintext email probably confuses System 1 enough that System 2 has time to rationally evaluate the email.

Real Humans, Simulated Attacks

Another speaker implicitly discussed this psychological dynamic during another Black Hat talk, “Real Humans, Simulated Attacks,” by Dr. Lorrie Cranor (@lorrietweet), Professor of Computer Science at Carnegie Mellon. Dr. Cranor discussed the difficulty in conducting scientifically valid security usability studies. The biggest obstacle in experiment design is that you often can’t have a real adversary to challenge users.

One intuitive type of study to conduct is to have users engage in security tasks, such as telling them that they are going to look at TLS warnings and determine whether it’s safe to continue. The structural problem with this is that the researcher has then primed users to ignore what their System 1 thoughts tell them to do when they do get the TLS warning, and they go straight to the System 2 evaluation of the situation.

In an ideal world, you could actually observe users interacting with security systems in the wild without knowing that their security behavior is being monitored. Cranor described a project her group is working on: measuring the rollout of Duo’s software across their campus system. Using our Admin API, they can pull data about the number of enrolled users and which factors and integrations they use — all without notifying them that they’re being studied. Done anonymously, this also can minimize privacy concerns. This type of project is something that Duo Labs is also currently working on, except we’re looking across all customers and industries, not just one educational institution.

The middle ground that Cranor’s group pursues is clever: minorly deceive the user and make them think they’re doing a study for some other purpose. This raises ethical issues, and researchers generally have to weigh the benefits and necessity of the lie, and then debrief the user afterward about what they lied about and why it was necessary. In many research institutions, studies involving human subjects have to be approved by an independent review board to ensure ethical behavior.

An example of this is to prompt the user to buy something on Amazon and then send them a phishing confirmation email. The overall situation is artificial, but the user is primed to be thinking about their retail experience, so the phish hits their System 1 perception. In a non-security scenario, users swat away warnings, as compared to when they’re told that their security perceptions are being studied.

Carla Burnett from Stripe found this out in practice, when she sent out the Github phish, and noted that 50% of users copy and pasted their passwords into the phishing site. That means that 50% of users got their password from a password manager, which certainly is encouraging, but password managers don’t work if the domain doesn’t match the original domain — a key sign of phishing.

These users clicked on their password manager, weren’t alarmed by the lack of a domain match, and went searching for their Github password by hand. This is evidence of a key education failure regarding password managers (or that password managers don’t always provide the right password, so users have been trained to not trust them). Despite the presence of a warning sign (no autocomplete/suggested password for that domain), their users ignored the implications and opened themselves up to exploitation.

Lessons for the Security Community

Conducting sound usability studies is a real struggle, and one that we too often treat casually. Asking users what they think about a security system will always gets the result of this System 2 analytical thinking, not the gut-feeling what-do-you-do-when-you’re-annoyed-by-security actions.

As a security community, we need to be cognizant of the biases toward thoughtfulness that slip into hypothetical security studies.

There’s tremendous value in “hallway testing,” and people developing security products should definitely have informal and formal hypothetical studies as tools in their metaphorical toolbox. These studies are relatively low cost, particularly hallway testing, which might take only a few minutes.

However, the nature of these methods makes it difficult to assess how System 1 would treat a situation. The holy grail, of course, is observation of real-world behavior of the actual users we’re trying to protect, but that’s not always practical (or ethical!). If it isn’t, we should confirm findings from hypothetical scenarios with studies that don’t prepare users to be particularly security-conscious to make sure we observe users’ System 1 reactions.