Framework for Large Data Processing under Constrained Resources
Computer Science and Engineering
AltmetricsView Usage Statistics
Data processing is used to uncover, transform, and classify information inside of data. Data-intensive research topics, such as environmental parameter prediction and sensor data imputation, require abundant computing power. To process big data efficiently, a server cluster is used for most cases. On one hand, a more powerful server cluster should be better. On the other hand, the powerful cluster will require a greater budget. "How to balance this tradeoff" is a challenge. Another challenge is how to improve communication between different nodes in a server cluster. The communication is usually through network and transportation speed is very slow.In this thesis, we propose a data processing framework that can provide stable service with a limited budget. “Stable” service means the average waiting time and queue length do not change massively. The key of this framework control strategy is to import budget and local server computing power concepts into the M/M/1/1/inf/inf queue model. To tackle the data communication challenge, data is compressed before transportation and decompressed when it arrives at its destination. An improved compression algorithm is proposed for this data transportation workflow, which leverages multiple GPUs and, to the best of our knowledge, is much faster than most other algorithms. Three data processing services that rely on the proposed framework are also presented in detail, to illustrate and prove the capabilities of our solution.