SLO-aware IAC Automation Framework for Dynamic Cloud Deployment.

Boëtius, Koen

View/Open

KoenBoetius_Thesis_Finalized_6694934.pdf (2.303Mb)

Publication date

2025

Author

Boëtius, Koen

Metadata

Show full item record

Summary

According to O’Reilly’s 2021 report [1], over 90% of companies worldwide utilize cloud computing, highlighting its critical role in the IT industry. To aid developers in managing cloud infrastructure, the paradigm of Infrastructure as Code (IaC) has emerged [2], allowing infrastructure to be defined and maintained through code. Designing infrastructure, however, requires extensive expertise, and in most business scenarios, it must adhere to certain constraints known as Service Level Objectives (SLOs). These SLOs impose limits on Service Level Indicators (SLIs) such as CPU usage, memory consumption, and uptime. This thesis explores two frameworks aimed at automating IaC creation while meeting defined SLOs, leveraging Large Language Models (LLMs) and statistical prediction methods. The first framework uses manually defined SLOs to guide the LLM in adjusting CPU and memory allocations, with the goal of achieving target performance while predicting potential SLO violations. The second framework uses statistical methods to derive SLOs from observed metrics, which are then used to iteratively refine the IaC through an LLM in pursuit of desired performance levels. Both frameworks were evaluated against a baseline. In no case did adjusting the infrastructure for SLO compliance result in performance matching that of the baseline. The first framework, which relies on manual SLO definitions, experienced several SLO violations after 3 LLM adjustments. Its best performance reached only 22% of the target throughput (131 RPS vs. 600 RPS). Conversely, the second framework, based on metric-driven SLOs, achieved up to 79% (476 rps vs. 600 rps) of the target throughput without violating any SLOs after three LLM-guided code adjustments. However, this improvement came at the cost of increased average response times and a significant rise in failed requests. Additionally, it was observed that prompt design greatly impacts the quality of the IaC output. When specific SLOs are provided for individual services, the LLM tends to overemphasize those services while neglecting others. What initially appears to be helpful information can quickly overwhelm the LLM and degrade output quality. Nevertheless, the findings suggest that with the insights gained, both frameworks can be further refined to yield improved results.

URI

https://studenttheses.uu.nl/handle/20.500.12932/49955

Collections

Theses