the use of random matrices as a tool for dimension reduction for high dimension reduction in high-dimensional problems
StatisticsView Usage Statistics
Modern science regularly collects large amounts of high-dimensional data. Examplesof such data are abundant in the biomedical science, geographical sciences,and many other sciences. However, researchers are always facing substantial challengesbecause some gathered data are too high-dimensional to analyze. These datahave the common characteristic ”small n large p, where p >> n”. In this thesis, wemainly discuss the approach of linear projections, especially principal componentanalysis and random projections to cope with high-dimensional data. We simulatedata in two settings: linear regression and survival analysis. We first reduce thedimension using either PCA and random projections and then use the ”reduced”data to predict survival and, in the case of linear regression, the dependent variable.In addition, we combine random projection with PCA proposed by Nguyen (2010),and compare this approach to PCA. We conduct several simulations to comparethe performance of different linear projections. Our results show that, in general,multiple random matrices can obtain a better performance according to the evaluationof bias and mean square error than PCA in the context of regression. Butwhen it comes to survival data, and using the Cox proportional hazard model, PCAoutperforms multiple random matrices in general.