Facebook has open-sourced a static code analyzer for finding and fixing flaws in Python code.
The Python Static Analyzer, or Pysa, scans the Python source code and analyzes how data flows through the application to identify security vulnerabilities, Facebook said. Many attacks rely on figuring out a way to get user input to access the codebase in unexpected ways, or to return a result that was not intended. A security and privacy flaw is frequently described as a situation where data flowed into a place it shouldn’t.
The internally-developed tool detected 44 percent of all security bugs in Instagram’s server-side Python code in the first half of 2020, Facebook said. Pysa detected 330 unique issues in proposed code changes, of which 15 percent (49) were categorized as “significant issues,” and 40 percent (131) were “real but had mitigating circumstances that made them less severe,” Facebook’s Graham Bleaney and Sinan Cepel said.
Pysa looks for connections between sources, or where important data originates, and sinks, or where data from the source should not be able to end up. If Pysa uncovers a path where a source eventually connects to a sink, the tool flags it as an issue. Common kinds of sources are places where user-controlled data enters the application. Sinks can include APIs that execute code, or access the file system.
Focus on Data Flows
Pysa can check that internal frameworks designed to prevent access to user data and expose user data are implemented properly. Pysa can also detect cross-site-scripting and SQL injection flaws. For example, a code used to load a user’s profile photo would not be an issue because the data it receives is restricted. However, if there was a way to go from user-controlled input to a SQL query, then Pysa would flag the issue.
However, Pysa’s focus on data flows means there are limitations on the kinds of security issues it can find. Not all security or privacy issues are related to data flows. Pysa won’t be able to ensure that an authorization check was performed before launching a privileged operation, for example. The fact that Python is a dynamic language also makes it difficult for a static code analyzer—code can be dynamically imported virtually at any point, but the analyzer doesn’t know what that code is doing yet.
“The dynamism of Python means there are endless pathological examples of flows of data that Pysa cannot detect,” Facebook said.
A static code analyzer capable of scanning something as large as Instagram’s codebase has to be fast. If it takes too long to scan, the tool with be less likely to used because waiting for the analysis may result in missing code release windows or delay when code is shipped. Pysa is capable of going over millions of lines of code from anywhere between 30 minutes and hours, Facebook said. A manual code review could take weeks or months.
That design decision meant a “trade off performance for precision and accuracy,” Facebook said. On top of being fast, Pysa has to find security vulnerabilities, so it is designed to “avoid false positives and catch as many issues as possible.” False negatives would be cases where a tool doesn’t detect a real security issue. That meant accepting there would be a high rate of false positive, when the tool said there was a security issue when there really wasn’t. Facebook said nearly half of the results Pysa initially returned in scanning Instagram code were were false positives.
To reduce the number of false positives to verify, Facebook introduced sanitizers and other features in Pysa to filter the results after analysis is done.
“Even with Pysa’s bias to avoid false negatives and our willingness to accept a good number of false positives, we still managed to cap false positives at 150 (45 percent) of the reported issues,” Facebook said.
Pysa is extendable, as it can be used with different Python frameworks and libraries. Facebook uses Python frameworks Django and Tornado, but Pysa can support other frameworks with “a few lines of configuration” to tell the tool where data enters the server.
"Because we use open source Python server frameworks such as Django and Tornado for our own products, Pysa can start finding security issues in projects using these frameworks from the first run," Facebook said.
Facebook earlier built Zoncolan, a static analysis tool capable of finding “thousands of potential security issues” in more than 100 million lines of code. Pysa uses the same algorithms to perform static analysis, and shares code with Zoncolan.
“Overall, we are happy with the trade-offs we’ve made with Pysa to help security engineers scale, but there is always room to improve. We built Pysa for continuous improvement, thanks to a close collaboration between security engineers and software engineers,” Facebook said.