About the project
The work builds on the paper Seven Years in the Life of Hypergiants' Off-Nets [1], which analyzes how large Internet platforms deploy infrastructure inside networks that they do not directly operate. These deployments, often referred to as off-nets, place servers inside third-party networks such as Internet service providers (ISPs) in order to reduce latency, improve routing efficiency, and bring services physically closer to end users.
Rather than relying on the datasets used in the original study, this project focuses on reproducing the measurement approach using newly collected data. The goal was to build a small Internet measurement pipeline capable of identifying infrastructure belonging to major platforms and determining whether that infrastructure is deployed within external networks.
Methodology
The measurement pipeline was designed to identify infrastructure belonging to large platforms that is deployed outside their own Autonomous Systems (AS).
Internet-wide scan
The first step was a scan of the entire IPv4 address space to identify hosts exposing HTTPS services on port 443. This scan was performed using ZMap, a tool designed for high-speed Internet-wide network scans. The scan ran from an Azure virtual machine for roughly two days. An opt-out webpage describing the research and providing contact information was hosted to follow responsible scanning practices. In total, about 14.8 million IP addresses were found with port 443 open.
Identifying candidate infrastructure
AS numbers of the target companies were obtained from PeeringDB. These AS numbers were mapped to their announced IP prefixes using public BGP prefix-to-AS datasets such as RouteViews. This made it possible to determine which IP ranges belong to the companies themselves.
TLS handshake information from servers within these ranges was then collected using ZGrab2. This produced Transport Layer Security (TLS) certificate metadata such as subject fields and DNS identifiers. The resulting datasets varied in size, with the Microsoft dataset containing roughly 130,000 entries whereas Facebook only had 200.
Fingerprint extraction
Infrastructure located inside a company's own network does not necessarily belong to that company, since networks may host third-party services. Noise was removed using a locally hosted small filtering pipeline built around Google's Gemma 3 12B model, which analyzed certificate metadata and classified entries likely belonging to the target company.
Validation was performed by manually filtering the Facebook dataset and comparing the results with the model’s output. The automated filtering produced slightly cleaner results and was therefore used for the larger datasets. This step produced TLS fingerprints describing infrastructure belonging to each platform.
Detecting off-net infrastructure
These fingerprints were used to search the Internet-wide scan results for matching servers. Hosts matching the extracted TLS patterns but located outside the companies’ own networks were classified as potential off-net deployments. This resulted in roughly 12,000 candidate IP addresses.
Validation and mapping
Additional validation was performed by issuing HTTP HEAD requests to the detected servers and inspecting response headers. Some services expose identifying headers that help confirm the infrastructure operator, allowing additional false positives to be removed.
The remaining IP addresses were mapped to their originating network and country codes, producing the final dataset used for the analysis and visualizations in the results section.
Results
Infrastructure distribution

This figure summarizes how the detected infrastructure is distributed across networks.
The left panel shows, for each company, the number of networks (AS) in which infrastructure was detected together with the total number of IP addresses identified. The right panel shows the average number of IP addresses deployed per network.
These metrics highlight differences in deployment strategies. Some platforms appear in many networks with relatively small deployments per network, while others deploy larger clusters of servers within fewer networks.
TLS certificate characteristics

TLS certificate metadata was used as one of the signals for identifying infrastructure belonging to specific platforms. In this context, certificate validity refers only to the certificate not being expired at the time of measurement.
Most detected certificates were still within their validity period. One notable exception appears in the Microsoft dataset, where approximately 86.8% of certificates were not expired. This likely reflects the larger and more heterogeneous nature of Microsoft's infrastructure, where additional hosted services or legacy deployments may appear within the same networks.
Estimated Internet reach

To estimate the potential reach of the detected infrastructure, networks were mapped to population estimates using the APNIC ASPOP dataset. This dataset provides an estimate of how many Internet users are associated with each network.
The left panel shows the estimated number of reachable Internet users in absolute terms, while the right panel shows the percentage of the global Internet population that could potentially be reached through these deployments. The estimate uses recent figures of approximately 6 billion Internet users worldwide.
Although this does not represent the exact number of users served by these servers, it provides an indication of the potential scale of infrastructure placed inside external networks.
Facebook deployment example

To illustrate the geographical impact of off-net deployments, the figure above shows the estimated Facebook reach per country. Values indicate the percentage of Internet users within each country that could potentially be served directly through detected off-net infrastructure.
Higher percentages suggest regions where infrastructure is placed closer to access networks, allowing a larger share of users to be served locally.
References
[1] Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, Xenofontas Dimitropoulos, Ethan Katz-Bassett, and Georgios Smaragdakis. 2021. Seven years in the life of Hypergiants' off-nets. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 516–533. https://doi.org/10.1145/3452296.3472928
This research was conducted as part of the Cloud Networking (201400177) course
Keywords: Research, Internet Measurement, Hypergiants
Supervisor(s): dr.ir. Raffaele Sommese