Installing Apache Airflow with Docker and Caddy on Oracle ARM (Ubuntu 24.04)

Running data pipelines often requires a robust orchestrator like Apache Airflow. Setting it up on modern ARM-based cloud infrastructure, such as Oracle Cloud’s Ampere A1 instances, can be efficient and cost-effective. This guide walks you through installing Apache Airflow using Docker Compose on an Ubuntu 24.04 ARM VM, using Caddy as a reverse proxy for easy subdomain management and SSL, and leveraging a dedicated block volume for persistent storage.

We’ll configure Airflow to run on its own subdomain (e.g., airflow.yourdomain.com) while keeping other services (like a WordPress blog on the main domain) separate.

Target Setup:

  • Server: Oracle Cloud ARM VM (Ampere A1)
  • OS: Ubuntu 24.04 LTS (ARM64/aarch64)
  • Orchestrator: Apache Airflow (via Docker Compose)
  • Reverse Proxy: Caddy (installed natively, handling SSL via Cloudflare DNS or standard ACME)
  • Storage: OS on boot volume, Airflow data (DAGs, logs, Postgres DB) on a separate Block Volume mounted at /mnt/blockvolume.
  • Access: Airflow UI accessible via https://airflow.yourdomain.com

Prerequisites

Before you begin, ensure you have the following set up:

  1. Oracle Cloud ARM VM: An Ampere A1 instance running Ubuntu 24.04 LTS.
  2. SSH Access: You can connect to your VM as a user with sudo privileges.
  3. Block Volume: A block volume attached to your VM and mounted read/write (e.g., at /mnt/blockvolume). We’ll use this path throughout the guide; adjust if yours is different.
  4. Docker & Docker Compose: Installed and running correctly on the ARM VM. Follow the official Docker installation instructions for Ubuntu ARM64.
  5. Caddy v2: Installed natively (not via Docker) and configured to serve your main domain (e.g., yourdomain.com) with SSL, potentially using Cloudflare integration.
  6. DNS Record: An A or AAAA record for airflow.yourdomain.com pointing to your server’s public IP address (e.g., configured in Cloudflare).
  7. (Optional) Cloudflare: Configured for your domain (yourdomain.com).

Table of Contents

Step 1: Prepare Directory Structure on Block Volume

We need dedicated directories for Airflow’s persistent data on the block volume.

# Define base path (adjust if your volume mount point is different)
AIRFLOW_BASE_DIR="/mnt/blockvolume/airflow"

# Create directories
sudo mkdir -p ${AIRFLOW_BASE_DIR}/{dags,logs,plugins,config,postgres-data}

# Set initial ownership (adjust if your primary user isn't the default)
# Note: We will hardcode the UID/GID in docker-compose later for consistency
sudo chown -R $(id -u):$(id -g) ${AIRFLOW_BASE_DIR}

# Set appropriate permissions (allow group write for Docker flexibility)
sudo chmod -R 775 ${AIRFLOW_BASE_DIR}

Step 2: Create the Docker Compose File

Airflow provides an official docker-compose.yaml file. We’ll download it and customize it for our LocalExecutor setup (simpler for single-node), ARM compatibility, block volume storage, and hardcoded configuration for consistency.

  1. Navigate to the config directory:
cd /mnt/blockvolume/airflow/config

2. Download the official file (check Airflow docs for the latest recommended version):

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'

3. Edit the Docker Compose File: Open the downloaded docker-compose.yaml with a text editor (sudo nano docker-compose.yaml). Replace the entire contents with the following configuration. This version uses LocalExecutor, points volumes to /mnt/blockvolume, removes Celery/Redis, binds the webserver to localhost, and hardcodes configuration values.

(Security Note: Hardcoding secrets like passwords and keys directly in this file is convenient for consistency but less secure than using a separate .env file, especially if this file might be shared or version-controlled. Ensure this file remains private on your server.)

# /mnt/blockvolume/airflow/config/docker-compose.yaml
# --- Airflow with LocalExecutor, data on block volume ---
# --- ALL Config values HARDCODED ---
---
x-airflow-common:
  &airflow-common
  # Official multi-arch image should work on ARM64. Update tag as needed.
  image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.10.5}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    # Use a strong, unique password for Postgres below and here:
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:YOUR_SECURE_POSTGRES_PASSWORD@postgres/airflow
    # Generate unique keys using commands below and replace placeholders:
    # Fernet Key: python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
    # Secret Key: openssl rand -hex 32
    AIRFLOW__CORE__FERNET_KEY: 'YOUR_GENERATED_FERNET_KEY_HERE' # Hardcoded Fernet Key
    AIRFLOW__WEBSERVER__SECRET_KEY: 'YOUR_GENERATED_WEBSERVER_SECRET_KEY_HERE' # Hardcoded Webserver Secret Key
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'true' # Set to false to disable example DAGs
    AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
    AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
    _PIP_ADDITIONAL_REQUIREMENTS: '' # Add python packages here if needed e.g. 'apache-airflow-providers-...'
  volumes:
    # Point to directories on the block volume
    - /mnt/blockvolume/airflow/dags:/opt/airflow/dags
    - /mnt/blockvolume/airflow/logs:/opt/airflow/logs
    - /mnt/blockvolume/airflow/plugins:/opt/airflow/plugins
  # Use the UID of your host user that owns the block volume directories
  # Replace '1001' with your actual UID from `id -u` command on host
  user: "1001:0" # Hardcoded UID:GID
  depends_on:
    &airflow-common-depends-on
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:13 # Ensure compatibility with Airflow version
    environment:
      POSTGRES_USER: airflow
      # Use the SAME strong password as in the SQL_ALCHEMY_CONN above
      POSTGRES_PASSWORD: YOUR_SECURE_POSTGRES_PASSWORD
      POSTGRES_DB: airflow
    volumes:
      # Persist Postgres data on the block volume
      - /mnt/blockvolume/airflow/postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s
    restart: always

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      # Expose ONLY on localhost for Caddy proxy
      - "127.0.0.1:8080:8080"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-scheduler:
    <<: *airflow-common
    command: scheduler
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  # Initialization service (runs once)
  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - |
        echo "Running init as User: $(id -u):$(id -g)"
        mkdir -p /opt/airflow/logs /opt/airflow/dags /opt/airflow/plugins
        # Ensure ownership matches the user directive in x-airflow-common
        # Replace '1001' with your actual host UID if different
        chown -R "1001:0" /opt/airflow/{logs,dags,plugins}
        echo "Attempting to check DB connection..."
        airflow db check || { echo "Database connection check failed."; exit 1; }
        echo "DB connection check successful."
        echo "Initializing the database..."
        airflow db init || { echo "Database initialization failed."; exit 1; }
        echo "Database initialization successful."
        echo "Creating admin user..."
        # Use '$$' to escape $ for Compose, so shell sees the variable
        airflow users create --role Admin \
           --username $${_AIRFLOW_WWW_USER_USERNAME} \
           --password $${_AIRFLOW_WWW_USER_PASSWORD} \
           --firstname Airflow --lastname Admin --email admin@example.com || \
           echo "Admin user already exists or creation failed."
        echo "Initialization complete."
    environment:
      <<: *airflow-common-env
      # Use a strong, unique password for the initial admin UI login
      _AIRFLOW_WWW_USER_USERNAME: 'admin' # Or choose a different admin username
      _AIRFLOW_WWW_USER_PASSWORD: 'YOUR_SECURE_ADMIN_UI_PASSWORD' # Hardcoded Admin Password
      _PIP_ADDITIONAL_REQUIREMENTS: '' # Keep empty for init
    user: "0:0" # Run init as root for permissions/DB setup
    volumes:
      # Mount volumes for ownership checks/setting during init
      - /mnt/blockvolume/airflow/dags:/opt/airflow/dags
      - /mnt/blockvolume/airflow/logs:/opt/airflow/logs
      - /mnt/blockvolume/airflow/plugins:/opt/airflow/plugins
  • Generate Keys/Passwords and Update Placeholders: Before saving the docker-compose.yaml file, you need to generate secure values and replace the corresponding placeholders within the file:
    • YOUR_SECURE_POSTGRES_PASSWORD: Create a strong, unique password for the PostgreSQL database user (airflow). Use a password manager or a command like openssl rand -base64 32. Replace both occurrences in the file.
    • YOUR_SECURE_ADMIN_UI_PASSWORD: Create a separate, strong password for the initial Airflow admin user login. Replace the placeholder in the airflow-init service’s environment section.
    • YOUR_GENERATED_FERNET_KEY: The Fernet key encrypts sensitive connection information stored in Airflow’s database. Generate it on your server:
      • Install prerequisites: sudo apt update && sudo apt install python3-pip -y && pip install cryptography
      • Generate key: python3 -c “from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())”
      • Copy the output (a long string) and paste it into the AIRFLOW__CORE__FERNET_KEY value in the compose file.
    • YOUR_GENERATED_WEBSERVER_SECRET_KEY: This key secures user sessions in the Airflow UI. Generate it on your server:
      • Generate key: openssl rand -hex 32
      • Copy the output (a 64-character hexadecimal string) and paste it into the AIRFLOW__WEBSERVER__SECRET_KEY value in the compose file.
    • User ID (user: “1001:0”): Confirm the user ID 1001 matches the output of the id -u command for your primary user on the host machine. Adjust if necessary.
  • Save the file after replacing all placeholders with your generated values.

Step 3: Initialize the Airflow Database

This one-time command runs the airflow-init service defined above to set up the database schema and create the initial admin user.

# Ensure you are in the config directory
cd /mnt/blockvolume/airflow/config

# Run the initialization
docker compose run --rm airflow-init

Watch the output for success messages regarding DB connection, initialization, and user creation.

Step 4: Configure Caddy Reverse Proxy

Edit your main Caddy configuration file (usually /etc/caddy/Caddyfile) and add a block for your Airflow subdomain.

sudo nano /etc/caddy/Caddyfile

Add this block (adjusting alongside your existing configurations):

# /etc/caddy/Caddyfile

# (Your existing site blocks, e.g., for yourdomain.com)
# yourdomain.com {
#   ...
# }

# Add Airflow configuration
airflow.yourdomain.com {
    # Proxy requests to the Airflow webserver container on localhost:8080
    reverse_proxy 127.0.0.1:8080 {
        # Pass essential headers to the backend
        header_up Host {http.request.host}
        header_up X-Real-IP {http.request.remote.ip}
        header_up X-Forwarded-For {http.request.remote.ip}
        header_up X-Forwarded-Proto {http.request.scheme}
    }

    # Enable compression
    encode zstd gzip

    # Caddy automatically handles HTTPS certificates
}

# (Other global options or site blocks)

Save the Caddyfile and reload Caddy to apply the changes:

sudo systemctl reload caddy
sudo systemctl status caddy # Check for errors

Step 5: Start Airflow Services

Now, start the main Airflow webserver and scheduler containers.

# Ensure you are in the config directory
cd /mnt/blockvolume/airflow/config

# Start services in detached mode
docker compose up -d

Step 6: Verify Installation

  1. Check Containers: Run docker ps to see if config-postgres-1, config-airflow-scheduler-1, and config-airflow-webserver-1 (or similar names) are Up and healthy.
  2. Check Logs (if needed): docker logs config-airflow-webserver-1 or docker logs config-airflow-scheduler-1.
  3. Access UI: Open your browser and navigate to https://airflow.yourdomain.com.
  4. Login: Use the admin username (default admin or what you set) and the YOUR_SECURE_ADMIN_UI_PASSWORD you configured in docker-compose.yaml.

If you see the Airflow dashboard, congratulations!

Step 7: Security & Next Steps

  • Admin Password: Although you set an initial password, consider changing it via the Airflow UI (Profile -> User Info -> Reset Password) if desired, especially if you revert to using .env files later.
  • Fernet Key: The key ensures connection passwords stored in Airflow are encrypted. Keep it safe.
  • Secrets Management: For production, explore more robust secrets management backends (like HashiCorp Vault) instead of hardcoding or using .env files.
  • DAGs: Place your DAG .py files in /mnt/blockvolume/airflow/dags on the host server. Airflow will automatically detect them.
  • Monitoring: Consider tools like Netdata or Prometheus/Grafana to monitor your VM and Airflow container resources.
  • Upgrades: Consult the official Airflow documentation for upgrade procedures, which typically involve updating the image tag in docker-compose.yaml, running database migrations, and restarting services.

Conclusion

You now have a working Apache Airflow instance running securely on its own subdomain on your Oracle Cloud ARM server. By leveraging Docker Compose, Caddy, and a dedicated block volume, you have a scalable and maintainable setup ready for your data pipelines. Happy DAG running!

Share