Distilling Public Data from Multiple Sources for Cybersecurity Appplications
AuthorSchnebly, James D
AltmetricsView Usage Statistics
The amount of data being produced every day is growing at a very high rate, opening the door to new knowledge while also bringing forth cyber breach opportunities for malicious users. In this thesis, the objective is to analyze public data to gain valuable insight for cybersecurity applications. Using public Twitter account data, a machine learning model is trained to identify bot accounts which helps lower the amount of fake news and malicious users. A survey of text summarization techniques to identify the best method for summarizing public data in the domain of cybersecurity is presented. A web application is also created to serve as a public tool for users to summarize input text of their choosing using a variety of algorithms. The contribution of this thesis is thus twofold: a model capable of identifying Twitter bots with high accuracy, and a web application for summarizing cybersecurity information from public data.