INTERNET TOPOLOGY MINING: FROM BIG DATA TO NETWORK SCIENCE
AuthorCanbaz, Abdullah M.
AdvisorGunes, Mehmet H.
Computer Science and Engineering
AltmetricsView Usage Statistics
Data has become one of the most valuable resources in today’s world where we have greater digital presence. Large volumes of data are generated through various platforms including web, social networks, mobile devices, scientific instruments, infrastructure sensors, and many other IoT devices. A challenge for researchers is to mine valuable relevant information from big data efficiently and in a timely manner. Internet is the largest man-made complex system whose underlying network has not been characterized precisely. Internet topology is shaped by tens of thousands of network providers optimizing local communication efficiency without a central authority. Numerous methods and platforms have been developed to accurately measure and analyze the Internet topology data.In this dissertation, we perform a comprehensive analysis of existing Internet topology data sets, develop and deploy our measurement platform to obtain detailed topologies of Autonomous System (AS) networks, and analyze collected data to understand path stability and topological characteristics of backbone networks. Our results indicate that the use of multiple data sets from different vantage points is important for building a comprehensive picture of the Internet topology as each data set provides a unique contribution into visibility of a network. Analyzing earlier measurement data sets, we implement an Internet topology mapping and analysis system that collects detailed measurements from a set of measurement nodes on top of the large-scale topology data shared in public repositories. Our design intelligently utilizes the big data collection and processing approaches for mapping the Internet’s underlying topology in order to better understand network characteristics and discovers more than thirteen times links then all other data repositories combined. Analyzing collected network data, we observe that most of the ASes have star-like topologies where high degree hubs connect low degree routers but tier-1 ASes often have a power-law degree distribution in a small-world network topology; there are persistent routing anomalies and loops in the end-to-end communication over the Internet; and network paths within individual ASes are mostly non-shortest paths indicating load distribution by the Internet Service Providers (ISP).