Absolute Frequency Data for Statistical Computing: A Comparison With Sample-Based Approaches and Guidelines for Improving Software Implementation
AuthorRobards, Amy Elisabeth
AltmetricsView Usage Statistics
Data sets comprising a large number of discrete data points that include multiple repeated values can be expressed in absolute frequency form, which represents the data more compactly by listing the number of occurrences for each unique value present in the full data set. This form of data can significantly reduce the space required to store the data and can speed up calculations on the data. The purpose of this thesis is to assess the decrease in computing time and increase in storage efficiency gained by performing statistical computations using absolute frequency data. These results quantify the reduction in data storage requirements for absolute frequency data relative to using the sample form of the data, and illustrate the potentially large speed-up gained by performing statistical computations using the absolute frequency form of data. These results suggest that statistical software should take advantage of these benefits and accommodate absolute frequency data input. Using R as a case study, a summary of current capabilities is presented and guidelines for adapting functions to better accommodate absolute frequency data are provided.