Software companies and chip makers are working on firmware updates and software fixes to address Spectre, Meltdown, and other side-channel attacks against processors, but we can’t patch our way to better hardware security. Chip security depends on incremental tweaks made over the next four years.
In the latest round of chip processor Whack-a-Mole, Google Project Zero and Microsoft Security Response Center disclosed details of a new attack using speculative execution to expose data stored on processors through a side channel. Speculative execution refers to the way processors try to guess what actions programs would take, and to preemptively execute instructions while waiting for slower tasks to complete. Called “Speculative Store Bypass,” this issue (Variant 4) is a derivative of the side-channel methods Spectre (Variants 1 and 2) and Meltdown (Variant 3) and can let attackers extract secrets such as passwords from protected kernel or application memory. The researchers also disclosed another method, the Rogue System Register Read (CVE-2018-3640) or Variant 3a, which allows normal programs to access system parameters (such as hardware status flags) which should be restricted to the operating system kernel, drivers, and hypervisors.
There has not been any public reports of malware attacking Meltdown (CVE-2017-5754) or Spectre (CVE-2017-5753, CVE-2017-5715) in chips, and experts speculated the likelihood of malware targeting Variant 4 (CVE-2018-3639) was also very low. That kind of malware would require local access, and if the attacker has access, there are scores of other methods that would be far easier to execute. Simple social engineering techniques are just as effective, if not more, as complicated hard-to-target side-channel attacks in getting users to run untrusted code. It doesn’t mean criminal and government attackers with local or physical access to equipment won’t ever take advantage of these issues, but there are other more likely attack vectors that should be addressed before worrying about these.
“These present themselves as rather exotic attack vectors,” said Tod Beardsley, research director at security company Rapid7. The conditions necessary to exploit these issues were “rather steep and complex,” he said.
Updates for Intel, AMD, and ARM chips are currently being tested and validated by operating system makers and OEM system manufacturers and will soon be released into production BIOS and software updates. However, Intel and AMD have said the updates will be turned off by default and users will have to decide whether to actually apply the patch. It also helps that Variant 4 has already been mitigated in most web browsers, the most likely vector for attack, by the previously released updates for Variant 1 (Spectre).
We are looking at approximately a four-year cycle of new chip development.
After Meltdown and Spectre, many researchers warned that increased scrutiny on side-channel meant more attacks will be found, so these variants were expected. Intel even expanded its bug bounty program to specifically include side-channel attacks.
“Given the complexity and ubiquity of side-channel attacks enabled by speculative execution, I doubt these will be the last variants that will be announced,” Beardsley said.
These chip-level security flaws aren’t mistakes in the sense that the designers did something wrong or introduced an error. Rather, they reflect how using out-of-order-execution to boost speed and performance affected existing security and memory protection mechanisms. To permanently fix these design-level issues, chip manufacturers have to go back to the drawing board and create designs that have better security protections and address the uncovered issues. The current firmware and software updates are just mitigations until the comprehensive hardware fix is available.
Starting over with a fresh design and adding improved security protections from the start sounds tempting, but it doesn’t make engineering sense. There are some projects (RISC-V comes to mind) attempting to change the architecture, but they are all in their infancy. It would also be expensive, since changes can't be tested without rebuilding, laying out, and manufacturing the chip. For a company like Intel, testing would cost several million dollars and take two to three months for each iteration.
“Flat out, that’s [a complete rewrite] not gonna happen,” said Joe FitzPatrick, a hardware security researcher and trainer.
Instead, we are looking at approximately a four-year cycle of new chip development.
Just as a software project matures over time with bug fixes, refactored code, and overall improvements, the modern chip is the result of 20-plus years of architectural work. There have been times when developers decided to abandon the codebase and start fresh, and while the resulting code may not have issues that plagued the previous version, that maturity—the hardening and refinements—is also gone. Similarly, if the new chips are based on a brand-new design, all the improvements and shared knowledge from the last two decades about boosting performance using caching and branch prediction are lost.
“It always results in a step backwards until you can build up the experience and maturity of that product,” FitzPatrick said.
For example, caching, a data-sensitive performance enhancement, and timing side channel are intertwined. If there is cache, there is timing side channel. It’s possible to make it more difficult to exploit the timing side channel, but removing the side channel requires removing the cache. While it’s possible to remove the enhancements and optimizations, but the performance hit and reduced capability would be too high a price to pay for most.
“If you turn off all your caches, which does remove that side-channel, you’re going back 20 years,” FitzPatrick said. “It’s more effective to find these tiny little workarounds, to basically fine tune this performance enhancement.”
The software update reduced performance on some servers by 20 percent. While high, the loss is temporary, as chip vendors will likely tweak future updates to regain what was lost. Chip makers will also modify the designs with a hardware-equivalent of the software patch as a “quick fix.” Chips made off this redesign will make up some of the lost performance because hardware can be faster than software.
Many researchers warned that increased scrutiny on side-channel meant more attacks will be found.
Intel has promised new updates in the next few months—a year since the chip giant first heard about the flaw. ARM told The Register it would be providing detailed blueprints for Cortext-A72, Coretex-A73, and Cortex-A75 cores that are resistant to Variant 2 to system-on-chip designers by July.
These fixs will be incorporated into future chip designs currently in development, along with changes in the architecture to improve performance and to include additional enhancements to ensure the issue has been fixed. Over the next four years, there will be different iterations of the designs, with each one having better security fix and performance, until the gap is closed.
“My guess is it'll take four years to totally regain the performance back,” FitzPatrick said.
Those new chips in 2021 will not be considered the “Meltdown Fix” because the fixes were already delivered, first via software update and second by chips with a quick fix.
Just a few years ago, asked to pick between a high-performing chip with security trade-offs and one with reduced performance and more security, no one would have willingly taken the performance hit. That was one of the surprising things about the response to Spectre and Meltdown, as developers accepted a software update which reduced processor performance by as much as 20 percent in order to address a hardware security issue.
Cloud providers were the key, as their server farms were the ones among the most affected by the performance hit, but they also have to think about security. Even though they might not want to lose 20 percent of their capacity, they’re not willing to be exposed to an issue which could completely undermine the security of their entire operation.
“I actually thought it was going to take blood, or explosions, or fire in order to get people to take a performance hit for security,” FitzPatrick said