Making Incident Response Tangible
Every security incident is unique, just like every medical emergency. Regardless of the differences, the goals are very similar: identify the problem, prevent further damage and fix what has been broken. The disconnect from one event to another is the rate at which we respond, which should be based on severity, not category.
There is great risk to an organization that throws all available resources at one problem just because that is what's on the burner at that particular time. What we end up sacrificing is proper coverage for other events. Not to mention, having ‘too many hands in the pot’ could lead to missing important steps due to a lack of organization and structure.
To reduce this risk, organizations need to put a greater emphasis on the triage phase of their incident response efforts. This is the key moment when security analysts take the first pieces of available information and use critical thinking skills, intuition and previous experience to judge the severity of the event based on the damage it has caused or is likely to cause, not solely on the category in which it belongs.
9-1-1, What is Your Emergency?
We have all undoubtedly heard this phrase in movies or on television. The calm voice of a 9-1-1 dispatcher who is ready to take whatever information the often panic-stricken person on the other end of the line is able to give them. Are they reporting a car accident? A shooting? A fire? A hangnail on their big toe? Every time the phone rings in an emergency call center, the nature of the call is different, but one thing is certain: someone needs help.
Once the information is received by the dispatcher, the information is then relayed to local emergency medical services (EMS) first responders and their job, just like that of an information security analyst, is to make an initial assessment of severity to determine the priority level of the call - which means they don’t always go lights and sirens!
There is a very strong parallel between the decision that EMS workers and analysts make when it comes to the priority at which an incident should be responded to. And like EMS, when a major breach or incident occurs, it's up to analysts’ to respond in a way that reduces and prevents further damage when every second counts! We are also first responders. While we may not hold people’s lives in our hands, we are responsible for ensuring that the livelihood of our fellow employees remains intact.
There are several common phases of incident response as it relates to information security. At Duo, we break our incident response process into the following phases:
Believe it or not, EMS follows a very similar structure when responding to calls, which also starts with detection and reporting. This is followed by EMS workers figuring out exactly what the problem is (triage and analysis) before they can give proper medical care (containment and mitigation). After all of that is complete, there is paperwork to be done (follow-up).
Regardless of whether we are talking about human lives or computer systems, incident response starts with two primary elements, detection and reporting, which are the lifeblood to the most crucial phase of incident response: triage.
Proper detection and reporting is crucial to ensure that the triage phase is most effective. These phases can occur in numerous ways, but ultimately boil down to relying on either tools or people.
Unfortunately, tools and people are not perfect. False positives can occur from a detection and reporting standpoint, just as easily as things can be overlooked. In an emergency situation, panic sets in, causing our judgment and perspective to change, which could alter the information necessary to triage properly.
For an analyst, an important part of triage is being able to identify the function and information impact of the event that has occurred. The table provides a general standard to describe the high, medium and low ranking levels:
|Priority Level||Functional Impact||Informational Impact|
|High||All users are unable to perform critical functions||Data was exfiltrated and potentially made publically available|
|Medium||A subset of users are unable to perform critical functions||Data was changed, deleted or otherwise compromised|
|Low||Users can still perform critical functions||Data was not affected|
The table below shows a side-by-side comparison of EMS and security-related incidents which have been triaged as high, medium and low. Subtle differences between each level show how the priority of an incident can change between incidents of the same category; in this case, a car accident and a phishing campaign.
Known Information Following the Detection and Reporting Phases
|High||Male, mid-20s, currently unconscious following a car accident||Employee notices hundreds of messages containing an attachment have been sent from their account on their behalf|
|Medium||Male, 26 years old, experiencing dizziness following a car accident||Employee clicked the link within a phishing message and entered their credentials into a fake website|
|Low||Male, 26 years old, involved in a car accident with a broken wrist||Potential phishing message reported without clicking links or opening attachments|
In all three of these examples, severity of the incident was taken into consideration, which helped to determine the priority level.
Triage is the phase that can make the difference between a good and bad outcome because it changes how and when we respond.
The examples in the table show that a high priority level resulted in EMS workers needing to arrive on scene as quickly as possible because the patient’s life was at stake. The analysts in the high priority example also needed to respond as quickly possible because damage was already being done using the employee’s account.
As we can see from the table, the category of the incident did not determine how the events were responded to. Not every car accident and phishing campaign result in a worst case, high priority scenario, and the triage phase helps us to see those differences.
Triage is an Initial Assessment
The main goal of the triage phase is to help set the tone for how and when the next phase (analysis) is executed, keeping in mind that the status or an event can change at any moment. Priority levels can increase or decrease during the incident response process depending on what new information is received throughout the investigation. This will most commonly happen during the analysis phase because it is the first opportunity to actually see what the problem is first-hand, rather than relying on tools or people to detect and report.
How Duo Does Incident Response
Duo is changing the way traditional incident response (IR) is conducted. Rather than relying on analysts with little experience to triage incoming incidents, we push that responsibility to those with more experience. This is much different than they way traditional IR programs are run.
Often, more experienced analysts are called to action when an incident is larger than someone with less experience can handle. We have flipped this because it helps us to better assign resources during the first few moments after an incident is identified, which is critical.
Another way that Duo is doing IR differently is by employing a Kanban board for tracking incidents. This helps analysts to quickly see where an incident is within the investigation process at any point in time, which provides structure and organization. We can also identify the priority level at a glance based on the color coding. The Kanban board helps us to make sure investigations continue to move forward while tracking all of the pertinent information.
All of this data is used for our monthly retrospective meetings where we talk about our IR process: what went wrong, what went well and where can we improve? We also have an additional column for ‘review.’ This is an opportunity for a more experienced analyst to review the work of their coworkers to ensure nothing was missed.
Our ‘finalize documentation’ column is where we put incidents that have helped to inspire a change to our IR process. The ticket will remain in this column until the changes have been made, at which point, it will be moved to ‘done.’
Pulling It All Together
I decided to compare EMS to security incident response because I wanted to relate incidents back to something that was easier to grasp - human life. Not only is it tangible, it is also something that we can all (hopefully) easily relate to.
We can all imagine ourselves in the shoes of the EMS worker who is going to respond to a car accident. Understanding that a person who is unconscious is in a more severe situation than someone with a broken wrist is relatively simple. We can picture the rate of speed at which we would respond to each of those scenarios. Incident response is not as easy to relate to because you can’t ‘touch’ a breach or compromise that is occurring. We have to rely on the data in front of us to decide whether or not we respond with lights and sirens, just like EMS workers.
Triage is critical because it has the ability to make or break your entire investigation. Organizations that want to improve their IR process should consider making their more experienced analysts responsible for the triage process, which includes assigning resources. Experience in this phase is important because it is more than automatically assigning a category to the incident.