Optimize Your Azure Costs with Python Automation: Start/Stop Effortlessly

Cloud infrastructure expenses can quickly consume your budget without providing any additional benefits. To ensure optimal return on investment, businesses aim to utilize available resources effectively and minimize waste and unnecessary spending.


Table of contents

Introduction

Azure provides a variety of services, each with different cost structures, making it important to understand how to optimize costs and ensure efficient usage of resources. By following best practices and utilizing cost-saving strategies, organizations can significantly reduce their overall cloud expenses and improve their bottom line.

Let's first explore various methods and optimization techniques for reducing cloud expenses:

  • Track utilization of every component and adjust(scale-up/scale-down) as necessary
  • Employ the most suitable resources for development workloads, such as spot virtual machines
  • Eliminate unused resources
  • Turn off inactive resources during weekends/nights

Of these methods, deallocating idle resources during weekends/nights can result in up to 60% cost savings and can be achieved with python automation.

Deallocating idle resources is suitable for Dev/Staging workloads not for Production

Get Azure service account details

To get programmatic access to azure resources we need to create service principal, check out all details in Azure docs

How to create Jenkins pipeline to run on Kubernetes?
In containerization world we run Jenkins pipeline on Kubernetes platform as a pod instead of running it on legacy nodes.

Save costs on Azure Virtual Machines (Deallocate Azure VMs)

Azure virtual machines are largely used to host various applications, let's understand different states of virtual machines

  1. Running: The VM is fully operational and active you are charged for compute & storage
  2. Stopped (Deallocated): The VM is not running but its storage resources are still reserved, you are charged for storage and not for compute
  3. Stopped (Generalized): The VM has been shut down and its disks have been generalized. You will be charged for both storage and compute
  4. Deleted: The VM and its associated resources have been permanently deleted and cannot be recovered.

So to save on costs we need to stop & deallocate Virtual machine this can be done manually from azure portal or through az client.

Using the 'shutdown' command to shut down a VM from the terminal will not release its resources and you will continue to incur charges.

You can use below python function to stop/start Azure virtual machine - you need to pass below variables to the function:

  • resource_group - RG name for the virtual machine
  • vmname - Name of the VM
  • action - Action to be performed on VM (start/stop)
import os
from azure.identity import ClientSecretCredential
from azure.mgmt.compute import ComputeManagementClient

# Start/Stop VM
def startStopVM(resource_group, vmname, action):
    credentials = ClientSecretCredential(
            client_id=os.environ['AZURE_CLIENT_ID'],
            client_secret=os.environ['AZURE_CLIENT_SECRET'],
            tenant_id=os.environ['AZURE_TENANT_ID']
        )
    subscription_id = os.environ['AZURE_SUBSCRIPTION_ID']
    try:
        compute_client = ComputeManagementClient(credentials, subscription_id)
        if action.lower() == 'start':
            logging.info("Starting Virtual Machine: %s", vmname)
            async_vm_start = compute_client.virtual_machines.begin_start(resource_group, vmname)
            async_vm_start.wait()
            return True
        elif action.lower() == 'stop':
            logging.info("Stopping Virtual Machine: %s", vmname)
            async_vm_stop = compute_client.virtual_machines.begin_deallocate(resource_group, vmname)
            async_vm_stop.wait()
            return True
    except:
        logging.exception("Exception occurred with VM, %s", vmname, exc_info=True)
    return False

Python Code to Start/Stop Azure VM

The code requires the following environment variables to be set: AZURE_SUBSCRIPTION_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID which we obtained after creating service principal.

Points to Note:

  • If you shut down (deallocate) and restart your virtual machine, it is possible that your public and private IP addresses will change if they are not set to be static
  • You will still incur charges for any storage attached to the virtual machine.

Save costs on Azure Kubernetes Service (Stop AKS)

The Azure Kubernetes Service is commonly utilized for container-based applications, and you can halt the AKS cluster when it's not in use. You can use the following Python function to start/stop the AKS cluster.

You need to pass below parameters to the function -

  • resource_group - RG name for the virtual machine
  • cluster_name - Name of the AKS Cluster
  • action - Action to be performed on VM (start/stop)
import os
from azure.identity import ClientSecretCredential
from azure.mgmt.containerservice import ContainerServiceClient

def startStopAks(resource_group, cluster_name, action):
    credentials = ClientSecretCredential(
            client_id=os.environ['AZURE_CLIENT_ID'],
            client_secret=os.environ['AZURE_CLIENT_SECRET'],
            tenant_id=os.environ['AZURE_TENANT_ID']
        )
    subscription_id = os.environ['AZURE_SUBSCRIPTION_ID']
    try:
        containerservice_client = ContainerServiceClient(
            credential=credentials,
            subscription_id=subscription_id
        )
        if action.lower() == 'stop':
            async_aks_stop = containerservice_client.managed_clusters.begin_stop(resource_group, cluster_name)
            async_aks_stop.wait()
            return True
        elif action.lower() == 'start':
            async_aks_start = containerservice_client.managed_clusters.begin_start(resource_group, cluster_name)
            async_aks_start.wait()
            return True
    except:
        logging.exception("Exception occurred with AKS Cluster, %s", cluster_name, exc_info=True)
    return False

Python Code to Start/Stop AKS Cluster

Points to Note:

  • Kubeapi endpoint may change when you start & stop AKS Cluster
  • You should not repeatedly stop & start AKS cluster - it may result in errors (keep atleast 2 hours gap)

Extending Automation for multiple resources

This script can be expanded by incorporating python libraries such as argparse, pyyaml, json, and logging, making it more versatile in managing multiple Azure resources. I personally prefer using YAML to pass variables to scripts. You can refer to the following YAML example for passing multiple Azure resources.

dev:
  rg: DEV_RG
  vm:
    - dev-vm-api
    - dev-vm-frontend

uat:
  rg: UAT_RG
  aks:
    - UAT-AKS
  vm:
    - uat-jenkins

Find the complete Python automation on Github with usage instructions in the Readme.md file.

Conclusion

This solution can optimize the utilization of Azure resources by scheduling a Python script to stop the resources during non-working hours (night/weekend) and bring them back up during working hours. Additionally, by integrating the script with Jenkins, the environment owner can have the ability to start and stop the Azure infrastructure with ease.

This automation can bring upto 60% of cost savings on Azure infrastructure.

How to create Jenkins pipeline to run on Kubernetes?
In containerization world we run Jenkins pipeline on Kubernetes platform as a pod instead of running it on legacy nodes.