Real Time Throughput Estimation and Optimization
AuthorPEHLIVAN, Bahadir Ali
Computer Science and Engineering
StatisticsView Usage Statistics
Obtaining optimal data transfer performance is of utmost importance to today’s data-intensive distributed applications and wide-area data replication services to meet stringent networking demands. Tuning application-layer protocol parameters such as pipelining, parallelism, and concurrency can significantly increase efficient utilization of the avail-able network bandwidth as well as the end-to-end data transfer performance. However, determining the best settings for these parameters is a challenging problem, as network conditions can vary greatly between sites and over time. Poor protocol tuning can cause either under- or over-utilization of network resources and thus degrade transfer performance.Real-time throughput estimation and transfer optimization approach offer promising solutions as it can discover optimal transfer configuration in the run time without requiring an upfront work or making assumptions about underlying system architectures.In this thesis, we use a real time approach for efficient transfer optimization by offering a heuristic solution for quick search space exploration and reduced overhead of evaluating various configuration settings. In the first work, we developed a real time tuning method for optimizing the number of concurrent transfers to yield high throughput while keeping the system overhead at minimal. Real time tuning runs a series of sample transfers with different concurrency values to identify the value after which throughput increase becomes negligible. When it is compared with other non real time methods it is much more flexible to adapt different conditions. It was compared with a fixed size sampling method whose sample transfer size is decided by a model trained by previous transfer data. That method on average requires 20% of the data to be sampled and this is too much to afford for transferring large data. It was also compared with fixed time sampling method and it is seen that the fixed time method cannot adapt unexpected changes in network condition. To further improve the benefit of real time tuning solutions, in our second work we examined sample transfers to minimize searching time while not degrading accuracy. We evaluated several time-series and regression models to be able to quickly determine throughput convergence time. The results gathered in various networks with rich set of transfer configurations indicate that, in most cases, Autoregressive model can accurately estimate sample transfer throughput in less than 5 seconds. This is up-to 4x improvement over the state-of-the-art solution. We also realized that while the most common transfer applications report instantaneous throughput in at most every second, decreasing report intervals to orders of 100s of milliseconds is key to further reduce execution time.