Jan 17, 2017

The Weird World of Attribution

By Mark Loveless

It seems like everywhere you go online, you run into stories about hacking and how some nation state is behind it. A year ago, it was China. Now Russia's getting all of the headlines. And while we’d love to tell you it’s a load of bullshit, there’s a grain of truth behind it, which I’d like to take a stab at pointing out. (Full disclosure: I used to do attribution and research surrounding attribution at a previous employer.)

First off, you might wonder, why attribute? If we cover what’s tracked during the attribution process, things start to come into focus.

An Endless Supply of Indicators...

You can collect a lot of items while analyzing attacker data. There are the obvious ones — IP addresses, domains used, and so on — but that’s just the tip of the iceberg. Each of these is a clue, though often a vague one, but together they fit into a larger puzzle.

Let’s take a simple attack scenario, a targeted phishing email. A partial list of indicators include things like subject line, sender, recipient, some or all of the email body including the general topic (i.e., fake FedEx tracking notification), attachment or not, and the sender’s domain.

Pretty much everything in the headers alone can be used as an indicator: the date (including time and day), whether a compromised domain or a free email service like Hotmail was used, data within the Message-ID field, data within the User-Agent field, and on and on.

What was the attachment, an Office doc or PDF? Is there an older or recent vulnerability involving the attachment, or maybe a zero-day? Actually, zero-day are rather rare, because in most cases older vulns will work (unfortunately).

If there’s a URL in the email, is it a compromised domain or a domain created last week? Who registered it? What email service did they use? Is there a browser vulnerability at the pointy end of that URL? Which browser is impacted? Multi-stage payload?

Speaking of those payloads, security conferences have done entire presentations about determining the compiler used to compile an executable. This analysis can include not only the version of a particular compiler, but also the language — English, Chinese or whatever. Timestamps often include the timezone markers, so suppose it was compiled during the daylight hours of the +0300 GMT time zone (Moscow is in that time zone), a story starts to unfold.

...But They Could Lie

An astute technologist will point out that an adversary could falsify many of these indicators, making it look like a different group of attackers caused an incident, but there are some caveats to that. It requires that an adversary know all indicators surrounding another attacker to be able to look like them, and most of this data is non-public. Yes, there are dozens of indicators released in various threat intelligence feeds and in some of the more high-profile reports that security teams release to the public — but there are hundreds vs. dozens of indicators to track.

Statistically, grouping fresh indicators and classifying an incident, you won’t have a 100% match on all past indicators. Domains expire or outgrow their usefulness, popped systems used for command and control get wiped and reloaded, and data changes over time. But usually you’ll get a match of roughly 70-80%, which is a good indication that you’re dealing with a specific repeat customer.

Another nation state actor? The match of indicators dips to the 20-30% range, and a random attack, for example by a worm or script kiddie, will match 5-10% of indicators. (These numbers are meant more for illustration and may not be exact for your organization, but you get the idea.)

And here’s the important part: if they’re lying convincingly to the front line analyst handling incident response, it doesn’t matter. At one of my previous employers, there was a saying (picked up from the military), left of boom. You put all the indicators for an actor on a timeline, and you can see there‘s the “boom” point in the middle where the victim system is compromised. The place where you wanted to detect the adversary was always left of boom.

But if boom happens, you can better plan your incident response. With proper attribution you know what is likely to happen next and can act accordingly, thwarting the adversary’s efforts.. Even if an adversary is pretending to be a completely different adversary, predicting the right of boom stays the same.

To further complicate things, I’m speaking in broad strokes meant to evoke more than depict, as each attacker and attack is still to some degree unique, but hopefully you get the idea. A defender who’s tracking indicators on a large scale will see patterns emerge that can be acted upon — even in the middle of an active incident.

So What Does This Mean?

Knowing who your adversary is can be very helpful if your company is attacked, and certainly beneficial to planning defensive strategies. But in the big picture, knowing the nationality of your attacker means nothing.

Attribution as it is used in the incident response world refers to the previous paragraphs. Dealing with the nation state bankrolling the attacker organization, however, is handled by politicians, and most front line people care very little about it. Knowing the attacker is Russian- or Chinese-sponsored means nothing when you’re trying to locate every instance of a crafted backdoor on a few dozen infiltrated machines, but knowing what the attacker might do next is vital.

One More Thing

Oh yes, there’s one other category of indicators: the classified kind. Sure, you have your pile of nice indicators that allow you to differentiate Attacker ABC from Attacker XYZ, but the U.S. government does a lot of spy stuff — real spy stuff — and that data is what truly puts a face on Attacker ABC.

When a security company does attribution and links an incident to Fancy Public APT Name, they aren’t releasing the hundreds of thousands of indicators from the closely guarded database they used to statistically prove that it lines more than 80% with other indicators from said group; they release a few dozen. There are enough oddball indicators out there that could lead you to believe a particular nation state was involved, but often via nudge nudge wink wink these organizations find out from the U.S. government that yes, Fancy Public APT Name is backed by a specific foreign government.

Conclusions

Basic attribution is easy. Well, if you have that big fat searchable database of indicators you’ve been collecting, it’s easy. It can take minutes, and then you know what adversary you’re dealing with and can plan next steps. In fact, if it took more than a few minutes to determine who the adversary was, it might not be worth the effort. Entire companies are devoted to making products that apply AI, or at least making data analysis easier than sifting through indicators to be able to tell what might happen next, and getting that process down from minutes to seconds or even milliseconds.

The fact that a government waits weeks before announcing “it was China” or “it was Russia” is either spy-level fact checking or political maneuvering (or both!), and it’s not easy. Don’t confuse this with basic attribution. A lot of people within the infosec community dismiss entire reports simply because they’re looking at a few dozen easily-faked indicators and a “Russia did it” label. You can debate the validity of those few dozen indicators, but realize there may be a few hundred that have been withheld for “proprietary reasons” by the security boutique that released what’s on at least a partial level an advertising and PR report.

So the next time you see these headlines in the press and in reports from security companies, keep some of this in mind. There’s a lot going on behind the scenes.