NIST Looking at AI to Calculate Bug Severity

It’s hard work, analyzing a software vulnerability and determining its severity. The government agency responsible for the scoring may soon be getting some help from IBM’s artificial intelligence technology.

According to a Nextgov report, the National Institute of Standards and Technology is using IBM Watson as part of a pilot program to analyze software vulnerabilities and assign a Common Vulnerability Scoring System (CVSS) rating. The goal is to have Watson assign scores to most publicly reported computer vulnerabilities by October 2019, Matthew Scholl, chief of NIST’s computer security division, told Nextgov.

“We started it just to get familiar with AI, so we could get our hands on it, learn about it, kind of put it in a lab and experiment,” Scholl told Nextgov. “As we were doing it with this dataset we said: ‘Hey, this seems to be putting out results the same as our analysts are putting out.’”

Enterprises commonly rely on CVSS, a numeric rating system used to indicate a flaw’s severity, to determine which vulnerabilities need to be patched immediately and which ones could be considered lower priority. Scores range from 0.0 and 10.0 and take into consideration things like how complex the attack would have to be to exploit the vulnerability, whether the attack requires user interaction, and the likely impact on the system and the data as a result. Vulnerabilities that can be exploited remotely typically gets a higher rating than ones that require physical access to the device. An information disclosure flaw may not be rated as highly as a privilege escalation bug.

These scores are currently determined by NIST analysts, who typically spend 5 to 10 minutes scoring common vulnerabilities and longer on complex and/or novel ones, Scholl told Nextgov. As part of NIST’s pilot, Watson pored over hundreds of thousands of historical CVSS scores and then applied what it learned to assign scores to new vulnerabilities. In situations where the vulnerabilities were similar to common vulnerabilities that had already been rated (had a “long paper trail”), Watson’s score was within range of what human analysts would have assigned, Nextgov reported. Basically, if two human analysts would have assigned a CVSS rating of 7.2 or 7.3 for a well-understood type of vulnerability, Watson’s score fell in that very narrow range.

The fact that Watson’s scores align so closely with analysts’ scores means putting Watson in the mix may speed up the time it currently takes for vulnerabilities to receive a CVSS score. Over the years, researchers have reported waiting weeks, even months, for the vulnerabilities they found to be added to NIST’s National Vulnerability Database (NVD) with a Common Vulnerabilities and Exposures identifier. It was one thing to rely on humans to analyze each published vulnerability and calculate the CVSS score when it was just a few hundred vulnerabilities being submitted each week. Nowadays, companies and researchers are reporting several thousand vulnerabilities each week, and with the boom in the Internet of Things, the number of vulnerabilities being reported is expected to also grow.

Delays in publishing CVEs and assigning CVSS scores provide attackers with a window of opportunity to craft their attacks. Threat intelligence provider Recorded Future found that a majority of vulnerabilities published to NVD were usually already available in criminal forums, mailing lists, and other discussion areas where attackers exchange information about vulnerabilities and exploits.

Hey, this seems to be putting out results the same as our analysts are putting out.

“Adversaries aren’t waiting for NVD release and preliminary CVSS scores to plan their attacks. The race typically starts with the first security publication of a vulnerability,” Recorded Future’s chief data scientist Bill Ladd wrote at the time.

There is another benefit to adding artificial intelligence to the process: fewer mistakes. “A human may misread or miss that it is a ‘remote’ exploit and score it ‘local,’ or misclick in an interface that does the scoring. A machine, in theory, would not do that,” said Brian Martin, vice-president of vulnerability intelligence at Risk Based Security. The AI is less likely to make mistakes than humans if the scoring methodology has been implemented correctly, but that isn’t an easy assumption to make. Making sure the AI is learning correctly is a challenge on its own.

“Disclosures are radically different in format and detail, and some disclosures contain inaccurate information,” Martin said. Human analysts can identify those issues and adapt the scoring accordingly, but that kind of on-the-fly adjustment is harder for machines.

In the pilot program, Watson had difficulty scoring new and complex, or highly novel, vulnerabilities. This makes sense, since Watson learns from existing material, so it would have fewer references to consult for newer or uncommon flaws. In the NIST pilot, Watson provided a confidence level alongside the CVSS scores. Scholl told Nextgov that in cases where Watson’s confidence level dipped below high-90-percent, a human analyst would review and edit the scores.

CVSS isn’t always a good indicator for determining risk—Heartbleed had an initial CVSS rating of 5, despite the fact that security experts considered the flaw in OpenSSL one of the worst vulnerabilities they’d ever seen—and using AI won’t automatically change that. Other factors "need to be considered in order to establish a true technical risk score," NopSec said after analyzing 65,000 vulnerabilities in NVD over a 20-year period.

IBM has been touting AI as an assistant for humans unable to keep up with the volume of data. Critics who believe that CVSS scores are inaccurately calculated may not see an improvement with AI-assigned scores, but freeing up analysts to focus on unusual—and most likely, more serious—vulnerabilities can help close the timing gap between when vulnerabilities are found and when enterprises get information about them.

Nist Vulnerability