Optimize Your Azure Costs with Python Automation: Start/Stop Effortlessly
Cloud infrastructure expenses can quickly consume your budget without providing any additional benefits. To ensure optimal return on investment, businesses aim to utilize available resources effectively and minimize waste and unnecessary spending.
Introduction
Azure provides a variety of services, each with different cost structures, making it important to understand how to optimize costs and ensure efficient usage of resources. By following best practices and utilizing cost-saving strategies, organizations can significantly reduce their overall cloud expenses and improve their bottom line.
Let's first explore various methods and optimization techniques for reducing cloud expenses:
- Track utilization of every component and adjust(scale-up/scale-down) as necessary
- Employ the most suitable resources for development workloads, such as spot virtual machines
- Eliminate unused resources
- Turn off inactive resources during weekends/nights
Of these methods, deallocating idle resources during weekends/nights can result in up to 60% cost savings and can be achieved with python automation.
Get Azure service account details
To get programmatic access to azure resources we need to create service principal, check out all details in Azure docs
Save costs on Azure Virtual Machines (Deallocate Azure VMs)
Azure virtual machines are largely used to host various applications, let's understand different states of virtual machines
- Running: The VM is fully operational and active you are charged for compute & storage
- Stopped (Deallocated): The VM is not running but its storage resources are still reserved, you are charged for storage and not for compute
- Stopped (Generalized): The VM has been shut down and its disks have been generalized. You will be charged for both storage and compute
- Deleted: The VM and its associated resources have been permanently deleted and cannot be recovered.
So to save on costs we need to stop & deallocate Virtual machine this can be done manually from azure portal or through az client.
You can use below python function to stop/start Azure virtual machine - you need to pass below variables to the function:
- resource_group - RG name for the virtual machine
- vmname - Name of the VM
- action - Action to be performed on VM (start/stop)
The code requires the following environment variables to be set: AZURE_SUBSCRIPTION_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID which we obtained after creating service principal.
Points to Note:
- If you shut down (deallocate) and restart your virtual machine, it is possible that your public and private IP addresses will change if they are not set to be static
- You will still incur charges for any storage attached to the virtual machine.
Save costs on Azure Kubernetes Service (Stop AKS)
The Azure Kubernetes Service is commonly utilized for container-based applications, and you can halt the AKS cluster when it's not in use. You can use the following Python function to start/stop the AKS cluster.
You need to pass below parameters to the function -
- resource_group - RG name for the virtual machine
- cluster_name - Name of the AKS Cluster
- action - Action to be performed on VM (start/stop)
Points to Note:
- Kubeapi endpoint may change when you start & stop AKS Cluster
- You should not repeatedly stop & start AKS cluster - it may result in errors (keep atleast 2 hours gap)
Extending Automation for multiple resources
This script can be expanded by incorporating python libraries such as argparse, pyyaml, json, and logging, making it more versatile in managing multiple Azure resources. I personally prefer using YAML to pass variables to scripts. You can refer to the following YAML example for passing multiple Azure resources.
dev:
rg: DEV_RG
vm:
- dev-vm-api
- dev-vm-frontend
uat:
rg: UAT_RG
aks:
- UAT-AKS
vm:
- uat-jenkins
Find the complete Python automation on Github with usage instructions in the Readme.md file.
Conclusion
This solution can optimize the utilization of Azure resources by scheduling a Python script to stop the resources during non-working hours (night/weekend) and bring them back up during working hours. Additionally, by integrating the script with Jenkins, the environment owner can have the ability to start and stop the Azure infrastructure with ease.
This automation can bring upto 60% of cost savings on Azure infrastructure.