If you have any problems related to the accessibility of any content (or if you want to request that a specific publication be accessible), please contact (firstname.lastname@example.org)
Environment for Large Data Processing and Visualization Using MongoDB
Computer Science and Engineering
AltmetricsView Usage Statistics
Data means treasures to both scientists and business people. Scientists can discover significant rules and theories beneath data. People involved in business can find their potential customers and improvement suggestions from exploring data. To uncover these valuable treasures, we need effective and efficient tools. There are some traditional quality data management, processing, and visualization tools and systems. However, most of them are no longer as powerful as before when the data size grows larger. These tools and systems may respond slower or even stop working because of increasingly large data volumes. We are now in the digital era. Data is generated at an amazingly high speed. Remarkably, 90% of the data in the whole world has been generated in the last two years. To manage, process, and visualize large data, we need new tools and systems.In this thesis, we introduce the ELDP&V system. It is designed to manage, process, and visualize large data. We used MongoDB to record file paths and stored files in a filesystem to manage data. By using this method, we do not need to create different schemas for different files. The system offers users basic data processing methods, such as distribution frequency histograms. It also provides scientific models for users for data processing. ELDP&V can visualize data with line charts, bar charts, and 2D maps. Most of the visualization results are interactive. We implemented ELDP&V with some new ideas and design workflows, which led to higher performance. For example, in a test we performed the system has visualized 13,455,368 records in 50 seconds. We tested the same dataset with some traditional web-based tools. They stopped working or produced some errors. When ELDP&V visualizes a time-based file, it displays the chosen items based on timestamps. This means the loading time will not be affected by the file size.The thesis presents background on big data, outlines ELDP&V’s goals and characteristics, the proposed system’s design, and demonstrates its prototype in action. It also contains a comparison with related work and provides several pointers to directions of future work.