Network Security Monitoring and Analysis based on Big Data Technologies
StatisticsView Usage Statistics
Network flow data provides valuable information to understand the network status and tobe aware of the network security threads. However, handling the large amount of datacollected from the network and providing real time information remain as big challenges.<italic>Big Data</italic> technologies provide new approaches to collect, store, real time measurementand analysis of the large amount of data. This dissertation aims to provide a system ofnetwork security monitoring and analysis based on the <italic>Big Data</italic> technologies.First, I present an extensive survey of the network flow applications that covers pastresearch perspectives, methodologies, and a discussion of challenges and future works.Then, I present system design of the network security monitoring and analysis platformbased on the <italic>Big Data</italic> technologies. Components of this system include <italic>Flume</italic> and <italic>Kafka</italic>for real time distributed data collection, <italic>Storm</italic> for real time streaming distributed dataprocessing, <italic/>Cassandra</italic> for <italic>NoSQL</italic> data storage, data processing, and user interfaces. Thesystem supports real time continuous network monitoring, interactive visualization,network measurement, and advanced network modeling to classify host roles based onhost behaviors and to identify a particular user among the other users.It is critical to continuously monitor the network status and network security threats in realtime, but it is a challenge to process these large amount of data in real time. I demonstratehow the <italic>Big Data</italic> security system designed in this dissertation supports such features. Forinstance, querying a network host 24 hours network traffic took 56 millisecondsround-trip. Another usage of the network flow data is to measure the contents and usageof the network. I demonstrate how this <italic>Big Data</italic> system provides understanding of theusage of anonymity technologies on the campus Internet. Then I present methods and theresults of classification and identification of network objects based on the <italic>Big Data</italic> systemdesigned in this dissertation. <italic>Decision Tree</italic> and <italic>On-Line Support Vector Machine</italic> are usedto model host role behaviors and user behaviors. I report very high accuracy of host roleclassification and user identification.