If you have any problems related to the accessibility of any content (or if you want to request that a specific publication be accessible), please contact us at email@example.com.
Towards Autonomous and Efficient Machine Learning Systems
AltmetricsView Usage Statistics
Computation-intensive machine learning (ML) applications are becoming some of the most popular workloads running atop cloud infrastructure. While training ML applications, practitioners face the challenge of tuning various system-level parameters, such as the number of training nodes, communication topology during training, instance type, and the number of serving nodes, to meet the SLO requirements for bursty workload during the inference. Similarly, efficient resource utilization is another key challenge in cloud computing. This dissertation proposes high-performing and efficient ML systems to speed up training time and inference tasks while enabling automated and robust system management.To train an ML model in a distributed fashion we focus on strategies to mitigate the resource provisioning overhead and improve the training speed without impacting the model accuracy. More specifically, a system for autonomic and adaptive scheduling is built atop serverless computing that dynamically optimizes deployment and resource scaling for ML training tasks for cost-effectiveness and fast training. Similarly, a dynamic client selection framework is developed to address the stragglers problem caused by resource heterogeneity, data quality, and data quantity in a privacy-preserving Federated Learning (FL) environment without impacting the model accuracy.For serving bursty ML workloads we focus on developing highly scalable and adaptive strategies to serve the dynamically changing workload in a cost-effective manner in an autonomic fashion. We develop a framework that optimizes batching parameters on the fly using a lightweight profiler and an analytical model. We also devise strategies for serving ML workloads of varying sizes, leading to non-deterministic service time in a cost-effective manner. More specifically, we develop an SLO-aware framework that first analyzes the request size variations and workload variation to estimate the number of serving functions and intelligently route requests to multiple serving functions. Finally, resource utilization of burstable instances is optimized to benefit the cloud provider and end-user through a careful orchestration of resources (i.e., CPU, network, and I/O) using an analytical model and lightweight profiling, while complying with a user-defined SLO.