Web-Based Automated System for Cyber Analytics
AltmetricsView Usage Statistics
There have been many cyber attacks in recent years despite significant increases in security measures. This research aims to address the following research question: Can cyber breaches be analyzed automatically using structured data collected and organized using natural language processing techniques applied to web-scraped news articles? In this thesis, we introduce a Web-based Automated System for Cyber Analytics (WASCA) which seeks to address the fore mentioned research question. WASCA automatically collects data from publicly available news articles, analyzes the data and compiles it into a structured database. WASCA uses Natural Language Processing (NLP) techniques in order to identify cybersecurity-related information in the articles. This information is then compiled into a structured format. The main contribution of WASCA is the end-to-end automated process from the scraping of news articles to the analysis of cybersecurity related information. WASCA is flexible and can dynamically update the current repository with information as new articles become available. In addition to the WASCA tool, this thesis resulted in a large repository of cybersecurity-related articles, along with their structured data. The structured data compiled from these articles serves to digest information about cyber breaches in one place and provide analytics regarding cybersecurity trends and the impacts of breaches.