Installing Apache Airflow on Kubernetes - Comprehensive Guide

Deploying Apache Airflow on Kubernetes can streamline your data workflows and enhance scalability. In this guide, we focus on installing Airflow with the Official Helm Chart by Apache. We’ll also provide a brief comparison with the User Community Helm Chart to help you choose the best option for your needs.


Table of contents

Introduction

Apache Airflow is a powerful platform used to programmatically author, schedule, and monitor workflows. Running Airflow on Kubernetes brings significant benefits, such as:

  • Scalability: Easily scale your workflow components.
  • Resilience: Kubernetes manages container restarts and failure recovery.
  • Flexibility: Integrate with other cloud-native services and automate deployment.

By leveraging Helm charts, you simplify the complex process of deployment while ensuring consistency and ease of maintenance.

Understanding Helm Charts for Airflow

Helm charts serve as packages that help you define, install, and upgrade complex Kubernetes applications. In the context of Airflow, you might come across multiple charts:

  • Airflow Official Helm Chart: Official Helm chart by Apache maintained by Apache.
  • Airflow Community Helm Chart: Open Source & Community maintained helm chart for Airflow.
  • Airflow Bitnami Helm: An alternative chart provided by Bitnami that offers its own set of configurations.

Comparison: Official vs. Community Helm Charts

Before installing Airflow, it’s useful to understand the differences between the two primary Helm charts:

Official Helm Chart by Apache

This is the official Helm chart provided, created, and maintained by Apache.

    • Hosted on GitHub at apache/airflow/chart.
    • Maintained by the Apache Airflow team, ensuring alignment with upstream releases and documentation.
    • Offers standard configurations and the latest features straight from the Apache community.
    • Well-documented on the official Airflow docs.

User Community Helm Chart

This is the community driven Helm chart provided, created, and maintained by Airflow Community.

    • Available at airflow-helm/charts.
    • Driven by community contributions, which may offer additional customizations.
    • Often includes extended functionalities or custom presets tailored by community users.
    • May lag slightly behind official releases or differ in default settings.

For users who want the most up-to-date and officially supported features, the Official Apache Helm Chart is the recommended choice. However, if you need specialized customizations or want to experiment with community-driven extensions, the User Community Helm Chart might be worth exploring.

Step-by-Step Installation Guide Using the Official Helm Chart

This section details the complete process to install Apache Airflow on your Kubernetes cluster using the official Helm chart.

Prerequisites

  • Kubernetes Cluster: Ensure you have a running Kubernetes cluster.
  • Helm Installed: Install Helm on your local machine. Follow the Helm installation guide if needed.
  • kubectl Configured: Your kubectl should be configured to interact with your Kubernetes cluster.

Step 1: Add the Apache Airflow Helm Repository

First, add the official Apache Airflow Helm repository:

helm repo add apache-airflow https://airflow.apache.org
helm repo update

This command fetches the latest chart definitions and prepares your environment for installation.

Step 2: Configure Your Deployment

Before installation, create a configuration file (e.g., values.yaml) to customize your deployment:

  • Define Resources: Set CPU, memory, and replica settings.
  • Configure Connections: Specify your database and message broker configurations.
  • Enable/Disable Features: Customize the scheduler, webserver, and worker configurations.

A sample snippet might look like:

executor: KubernetesExecutor

# Below keys should be kept Very secret
fernetKey: <Generate Fernet Key Online from https://fernetkeygen.com/>
webserverSecretKey: <Generate Web Server Key>

dags:
  persistence:
    enabled: true

workers:
  replicas: 2

webserver:
  service:
    type: LoadBalancer

You can find more information on executor.

fernetKey: You can generate fernet key online from https://fernetkeygen.com/

webserverSecretKey: You can generate webserver secret key using below python code:
python3 -c 'import secrets; print(secrets.token_hex(16))'

Step 3: Install Apache Airflow

Install Apache Airflow using helm install command, values.yaml is custom values file created in previous step:

helm install airflow apache-airflow/airflow -f values.yaml

Step 4: Verify and Post-Installation Checks

After installation, check the status of your pods:

kubectl get pods -l release=airflow

Access the Airflow web interface (usually via a LoadBalancer IP or port-forward):

kubectl port-forward svc/airflow-webserver 8080:8080

Open your browser and navigate to http://localhost:8080 to confirm that the webserver is up and running.

Conclusion

Using the Official Helm Chart for Apache Airflow on Kubernetes simplifies the deployment process while ensuring you benefit from the latest features and official support. Whether you’re a beginner or an expert, this guide covers all necessary steps, from adding the repository to customizing and verifying your deployment.

Frequently Asked Questions (FAQs)

What is the difference between the Official and Community Airflow Helm Charts?

The Official Helm Chart is maintained by the Apache Airflow team and always aligns with the latest upstream releases, while the User Community Helm Chart often includes customizations contributed by users, which might offer extended features but can sometimes lag behind official updates.

How do I customize my Apache Airflow deployment using Helm?

Customize your deployment by creating or modifying a values.yaml file. You can adjust resource allocations, enable or disable certain components, and set configurations for the executor, database, and more.

What are the prerequisites for installing Airflow on Kubernetes?

You need a running Kubernetes cluster, Helm installed on your machine, and proper configuration of kubectl to interact with your cluster. Ensure you have the necessary access permissions and a configured environment before starting the installation.

Where can I find more detailed documentation on the Official Helm Chart for Airflow?

The detailed documentation for the Official Helm Chart is available on the Apache Airflow documentation site and the GitHub repository.