Security news that informs and inspires

Mapping the Internet, Who’s Who? (Part Three)


This is part 3 of a three-part series on Internet cartography. You can read Part 1 here and Part 2 here

For network administrators and IT teams, the random pings, scanning requests, and inexplicable traffic being sent is a normal part of administering a modern network. The challenge, therefore, is balancing two things: ignoring the scans because they are part-and-parcel of being on the Internet, and tracking the scans continuously because they could be the precursor to an attack. It helps to know who is scanning on the Internet in the first place.

"Lots of people are scanning the internet and no one is watching the watchers," says Andrew Morris, creator of Grey Noise.

Grey Noise is a system of honeypots and scripts which listens and collects all the Internet-wide scan traffic. It collects all of the expected background traffic that any Internet-connected machine sees, and has nodes listening "in every corner" of the Internet. Some of the scanners are benign, some are downright malicious, and others just are head-srcratchingly weird. It's like going to a party eavesdropping on every single conversation to understand what people are talking about instead of joining conversations.

"We [Sonar, Shodan, Censys, etc] will scan the Internet looking for things. They're [Grey Noise]listening to everybody that's scanning the Internet for stuff and they're basically seeing who's scanning for what, both researchers and malicious people looking for stuff," says Bob Rudis, chief security data scientist at Rapid7.

Scanners form the "expected background noise" of the Internet, says Morris. There are three main types of actors and activity: known benign mass scanners such as Shodan, Censys, and Sonar; malicious mass scanners such as Trojans, worms, and botnets; and unknown mass scanners, which is everything else. Most scanning activity falls into this unknown category. Of the millions of IP addresses scanning the Internet, 27 IP addresses are associated with Shodan, 334 IPs with Censys, 56 for Sonar, 145 for NetCraft, 228 for Shadowserver, and 253 IPs for BinaryEdge. In comparison, Grey Noise has tracked 249,000 IP addresses associated with Mirai botnet, 92,000 for SSH worms, and 590,000 compromised residential routers attacking other people. As for the unknown bucket, it isn't always clear what those scanners are doing.

Some of the scanners are benign, some are downright malicious, and others just are head-scratchingly weird.

"Someone is scanning TCP port 24972? Why? What is that? What runs on it?" Morris asks, adding that he is still working on uncovering some of those answers.

Any scanner can find out how many computers are running a specific version of software. It is much harder to figure out how many computers are scanning for a specific version of software, but that is exactly the question to ask in order to find malware and attackers. Someone with an exploit isn't going to lob it indiscriminately against every device; the first step would be to find vulnerable systems. Letting defenders ask the same question helps them proactively address the issues before the attackers get in.

Grey Noise collects raw scan data from between 50 and 100 cloud servers in many different regions belonging to many different cloud providers, including Digital Ocean, Amazon Web Services, Microsoft Azure, and Google Cloud. As infrastructure goes, it's a modest operation, costing Morris about $400 a month, but it gives Grey Noise visibility across large sections of the Internet. Grey Noise has aggressive iptables logs that record every packet hitting every port on every protocol, passive operating system (p0f) logs, and custom microservices that collect information for specific protocols such as HTTP, SSH, and Telnet. Grey Noise is also experimenting with ways to collect information for others, including RDP, NTP, SMTP, and DNS.

Every operating system creates packets slightly differently. Morris likened it to speaking with a foreign accent, that while the packets are perfectly valid the the little nuances reveal the origin machine's operating system. There are other elements that can give clues to the origin machine's identity, such as the browser's User-Agent and the system's hard-coded TCP parameters. In its current iteration, Grey Noise is collecting 1 million to 2 million iptables events; 700,000 to 1 million SSH logins; 1 million to 10 million telnet login attempts; and 10,000 to 100,000 HTTP requests, every day. Grey Noise processes 338 messages per second from all its nodes.

"I don't know why people are looking for the things that they're looking for, but I know where they're coming from, and I know the ports that they're looking for, sometimes the traffic they're sending over some of these services. Sometimes I can even fingerprint what operating system they're running, or what kind of scanning tool they're running," Morris says.

A quick look at the web server logs—HTTP data from port 80, 8080, and 8888 and HTTPS data on ports 443 and 8443—showed multiple code injection attempts, activity from the scanner for the Ethereum cryptocurrency, requests for the PHPMyAdmin, the web dashboard for MySQL and MariaDB databases, and looking for WordPress login pages and administrator interfaces for routers. There are "loads" of people still blasting SMB traffic (which the WannaCry ransomware abused) on port 445. There are lots of spoofed user agents and bots pretending to be search engines.

"Is the browser hitting this HTTP server really running Safari on a Linux kernel 3.1 box?" Morris asks.

The usual suspects—bots attempting to log in via SSH and Telnet using default credentials, and attempts to brute-force proxies—are always there, but there is always "a lot of weirdness and a lot of activity," Morris says, noting that several of the ports with high activity are those he had never heard of before (such as ports 53413, 9000, and 11994).

I track a bunch of malware, and that's useful," Morris says. "There is threat intelligence value, but there is not going to really be any APTs here.

Eventually the Grey Noise API will be used by "security researchers, internet-cartographers, threat intelligence vendors, and other cyber security vendors," says Morris. Researchers would be able to identify upticks and downticks in certain types of scanning, and possibility predict or track vulnerabilities. Researchers don't have to create their own scripts to collect the raw data if they tap into the API.

"Lots of people are scanning the internet and no one is watching the watchers."

"From a research perspective, there is an absolutely ludicrous amount of use cases," Morris says.

Security companies tend to focus on the definitely bad actors and activity but Grey Noise is among the select few who has enough data to tell whether the scanner is targeting a specific network or everyone else across the Internet. Enterprise defenders can use Grey Noise data to help filter out the good noise from the rest of the scan traffic. By integrating the data with security information and event management (SIEM) software, defenders can filter out all the omnidirectional (indiscriminate scans affecting everyone) traffic and focus on the targeted traffic.

Since Grey Noise also sees a lot of malware and botnet activity defenders can use the data to figure out whether any of the malicious activity is originating from their networks as that would be a sign their machines are compromised. When Grey Noise observes an IP address scanning for a given TCP port, a secondary script will check to see if the source IP address also has that TCP port open. If the answer is yes, it's a strong indicator of yet another worm. The logs also showed a high amount of scanning activity from conference and VoIP systems. Criminals often route calls overseas through compromised VoIP servers—another thing defenders can track down and clean up within their networks.

Morris estimates Grey Noise has visibility into "probably 10 percent of people" scanning the Internet, but there is a lot more that no one knows anything about.

"If anybody else on the internet has those questions where they ask, 'I wonder if anyone is looking for this? Or I wonder if anyone is scanning for this?' I've already done all of the leg work. You can just ask my data and I have the answers for you," Morris said.

This is part 3 of a three-part series on Internet cartography. You can read Part 1 here and Part 2 here