Navigating the Internet Scan Map

This is part 2 of a three-part series on Internet cartography. You can read Part 1 here and Part 3 here

Google plans to stop trusting Symantec certificates. When Chrome 66 hits Beta users on March 15, the browser will not recognize Symantec as a trusted certificate authority. If websites have Symantec certificates issued before June 2016, they will lose users because Chrome will block access to those sites. Those websites need to switch to certificates issued by a different certificate authority.

“With our last certificate scan at [port] 443, is this gonna be a really, really bad, terrible thing?” asks Bob Rudis, chief security data scientist at security company Rapid7.

Scanning tools like Censys and Shodan gather information about what kind of devices are on the Internet by probing public IP addresses and collecting any and all details. Answers to what will likely happen with Symantec certificates are somewhere in that repository of information. Duo Labs has been working with Censys to look at the results of the certificate scans and found that as of this time, approximately six percent of the certificates that were issued by Symantec roots have not expired yet. That translates to about 1.2 million hostnames that may be potentially impacted.

Simply put, Chrome 66 landing in the Beta channel isn't going to be terrible for most of the Internet, but affected site owners aren't going to be very happy when a chunk of their users can no longer reach their websites. The sites will continue to work under other browsers, creating an inconsistent experience. If the certificate is for servers that contain scripts, images, and other resources and not the main homepage, then users will still be able to access the page, but have a broken experience. There are still quite a few site operators who aren't thinking about the certificates because the expiration date is so far far down the road, but they need to be thinking about it now.

The scan data is valuable only if people could use it to answer important questions.

Whether or not a map is valuable depends entirely on how well someone can use it to navigate from one place to another. Same goes for Internet scans. The scanning tools pull together different types of information, such as the kind of device and how it is configured, but the resulting map—the scan data—is valuable only if people can use it to answer important questions.

"We're trying to build this little model of the Internet, and make it as rich and as accurate as possible, so that you can think of a question and immediately see what will respond," says J. Alex Halderman, chief scientist at Censys.

Take, for example, the question of how much of the web has moved from HTTP to the more secure HTTPS. Censys has the data to show adoption go up as new sites deploy HTTPS, of if there are certain types of websites that are lagging in making the move. The figures tend to be different than the ones reported by web browsers because the browsers are reporting the percentage of connections being made to HTTPS sites. When the vast majority of connections on the web go to sites like Google, Facebook, and Twitter, and those sites have supported HTTPS for a while, the browser numbers tend to be a little skewed.

"Scanning allows us to see all of these websites, to see all of these hosts, and to understand what that long tail of less popular services look like," says Zakir Durumeric, CTO and co-founder of Censys.

Who is Reading the Map?

Enterprises are trying to understand what their own networks look like. One team may be using Amazon EC2 and one department may not have communicated with a central IT team for several months. Employees may be bringing in their own printer, camera, or other devices. By looking at the scan data to identify all of the devices on the network, enterprise defenders can understand what they are responsible for and how they are vulnerable.

Industrial control systems frequently pop up in this kind of data because the devices are connected directly to the Internet. Government groups, or even manufacturers themselves, can use the information collected by the search engines to find these devices and lock them down to prevent attack. There are plenty of groups who want to make sure critical infrastructure remains off the Internet, and can use Censys—or another Internet scanning tool—to find any of those devices and secure them correctly.

CERT (Computer Emergency Response Team) groups in different countries are typically responsible for protecting the critical infrastructure within their boundaries, distributing information about software and hardware vulnerabilities, and protecting large companies from attacks. It's hard enough for organizations to have a clear picture of what their networks look like; it's even harder for these external groups to understand the infrastructure. When Heartbleed came out, it took a lot of people by surprise and security teams scrambled to assess their exposure as well as to fix the problems.

"We partnered with CERT groups to say, 'What data do you need in order to help people?' Because we have this data from these scans, it's a question of how do you get this data to the right people and empower them to make these changes as quickly as possible," Durumeric says.

Services that offer historical context in the scan data help us understand how the Internet is evolving. More companies are hosting through cloud providers rather than leasing out their own servers, and the dearth of IPv4 addresses means many service providers in other countries have changed how they connect new devices to the Internet, says HD Moore, the security expert behind Project Sonar.

"You can actually map a lot of [historical] trends, both on the consumer level in terms of Internet and technology adoptions, as well as how hands-on corporations are these days operating their own infrastructure," Moore says.

Back in 2012, Moore used Sonar to measure the global exposure of UPnP-enabled network devices and found that over 80 million unique IP addresses responded to UPnP discovery requests from the Internet. Since UDP amplification attacks can lead to large distributed denial-of-service (DDOS) attacks, Shadowserver has been running Internet-wide scans on a handful of UDP services to identify servers that could be potentially abused. Shadowserver data currently has the best source of information on how the use of UDP services, particularly UPnP, has evolved over the years, Moore says.

It's a question of how do you get this data to the right people...

Sometimes the scan data is just a first step and requires additional fine-tuning to be truly useful. Project Sonar, now managed by Rapid7, has a long list of ports and protocols that it scans regularly. Last spring, during the height of the WannaCry ransomware infection, Rapid7 researchers modified the existing lightweight scanner for the SMB protocol to collect more detailed information about operating system versions and patch levels, what types of machines were affected, and other relevant pieces of information. Since then, researchers have been monitoring how many servers are still exposing SMB.

"We will tweak and tune and adjust and refine our scans or create new ones if we wanted to respond to a particular activity," says Bob Rudis, chief security data scientist at Rapid7.

Last December, researchers created a scanner to detect the ROBOT (Return of Bleichenbacher's Oracle Threat) vulnerability, a 19-year old weakness in the RSA encryption standard that could let attackers force the web server to cough up the secret key needed to decrypt encrypted communications.

ROBOT was breaking SSL like it was 1998 again," Rudis says. "[We] refined the scan so that we could have a much better understanding of the overall protocol.

Censys is one of the many search engines available that collects information about every single device on the Internet. J Alex Halderman and Zakir Durumeric walk us through how Censys gathers data and why Internet scanning is good for security.

Need All Scan Data

Even just a few years ago, there was a lot of skepticism about Internet-wide scans, and port scanning in particular. Organizations didn't like having their logs filled up with alerts from the scanners and the intrusion detection systems going off because of the connection attempts. Even now, Rudis says some education is still necessary to explain that scanning is actually helpful. Many organizations still ask to opt-out of the scanning altogether. When that request comes in, Sonar honors the opt-out and doesn't scan that network. Some scanners ignore opt-out requests and go ahead and collect the information anyway.

Everyone's got a different view of the Internet.

Being ethical means reduced visibility. When looking at the amount of information these platforms already have, it is easy to forget that there's so much of the Internet they can't see. Sonar, for example, has about 50 million IP addresses it doesn't know anything about because of the opt-out list. Other services have their own blocklists.

Different scanners collect data at different times; some have blocklists while others don't; everyone collects different types of information. Every scan is different, so there is no standard way to combine the data. Even the same scanner will rarely have two scans be exactly the same. Even if the scans are run one day apart, there may be differences because sometimes systems are down, and sometimes systems are up. We get different results on any given day for a lot of different things.

Everyone's got a different view of the Internet. You really need all the scans," Rudis says. "You actually need all of them to get a really good complete picture of the Internet because we all miss something.

Researchers often grab data from multiple sources in order to try to get as comprehensive a view as possible.

Even with limited visibility and differences in what scanners collect, Moore was upbeat about how more and more organizations are beginning to use the scan data. Doing the Internet scan can be time-consuming and challenging, so researchers are happy to focus on the research itself and not on the technical aspects of getting the data.

"More eyes, more people, more acceptance of the technology," Moore says. "I think we're all going the right direction.

How Will Scanning Evolve?

The future of Internet mapping relies on the project evolving to adjust to new requirements and needs. The first challenge is to figure out how to keep scanning devices as more and more devices connect to the Internet using IPv6 addresses and not IPv4. Currently, all the scanning services focus only on the IPv4 address space because it is possible to examine every single IP address.

Durumeric says it is "a physical impossibility" to be able to scan every single possible IPv6 address, because of its larger address space. Researchers have been studying how IPv6 addresses are being allocated, and to look for patterns in how the addresses are assigned and used. If researchers can find a pattern, then they can figure out a way to scan the addresses. We're attempting to develop more algorithmic approaches where we can come up with the most intelligent IP addresses that we should target in scans," Durumeric says. "It's so tempting to scan all of them.

Researchers are also looking at ways to improve scanning time. For many enterprises, these scanning tools provide the most accurate and up-to-date information about their own networks. It is faster to look up their assets in the scan data than to try to re-generate their own network maps and asset inventory lists.

"One of the goals of Censys long term is to make that [platform] even faster so that when administrators are trying to look at their own network and they're trying to fix a problem they can start to look at this data in real time instead of relying on a scan from a day ago or from earlier that day," Durumeric says.

"You have to do your own legwork to make a lot of use out of it."

Rapid7 is looking at ways to speed up Sonar, as well. "If something hits the news and people need to know what's wrong, people need to know what's vulnerable, or people need to know what's exposed, we want to have the ability to scan that immediately," says Rudis. Back in 2014, it took Sonar almost a day to scan and return data; the turnaround time is now down to three hours.

Both Censys and Sonar are really targeted for use by security researchers. Rapid7 uses Sonar internally for research and to improve its own product portfolio. While there are ways for security teams to use the platforms to get information about specific assets associated with their networks, the interfaces aren't really designed for enterprise defense.

"You have to do your own legwork to make a lot of use out of it," Moore says.

Censys is attempting to change that, to make it easier for organizations to use. Censys was spun out of the University of Michigan last fall into its own company, and the team is working on a commercial version of the tool. Enterprises will be able to use Censys to secure their own organizations by examining their network attack surface, spotting new vulnerabilities, and importing the information into their own security tools. All of the data will be available for commercial use, which will allow developers to build additional products utilizing Censys data. This will make make security more data driven, as new security solutions will be validated by the underlying data.

"We're taking Censys out of the lab, it's finally time for it to graduate," Halderman says.

Who's Scanning Who?

It turned out to be harder than expected to create a definitive list of who is poking the Internet and looking for information about devices. Censys and Shodan are well-known, but there are plenty of other services that regularly scan the IPv4 space to find devices. Enter Grey Noise, whose mission is to count the scanners. While scanners scour the Internet looking for things, Grey Noise eavesdrops on everyone—researchers, defenders, and malicious actors— doing the scanning. Grey Noise tracks ports, protocols, and even what services are being used.

“They [Grey Noise] do a really good job, they’re not hokey, they do a really solid research on that site to be able to figure that out. It’s really helpful information, too,” Rudis says.

To learn what Grey Noise sees, check back tomorrow for Part 3. Part 1 is here

Internet Cartography

SEARCH

Mapping the Internet, Navigation (Part Two)

Who is Reading the Map?

Need All Scan Data

How Will Scanning Evolve?

Who's Scanning Who?