Running data pipelines often requires a robust orchestrator like Apache Airflow. Setting it up on modern ARM-based cloud infrastructure, such as Oracle Cloud’s Ampere A1 instances, can be efficient and cost-effective. This guide walks you through installing Apache Airflow using Docker Compose on an Ubuntu 24.04 ARM VM, using Caddy as a reverse proxy for easy subdomain management and SSL, and leveraging a dedicated block volume for persistent storage.
We’ll configure Airflow to run on its own subdomain (e.g., airflow.yourdomain.com) while keeping other services (like a WordPress blog on the main domain) separate.
Target Setup:
- Server: Oracle Cloud ARM VM (Ampere A1)
- OS: Ubuntu 24.04 LTS (ARM64/aarch64)
- Orchestrator: Apache Airflow (via Docker Compose)
- Reverse Proxy: Caddy (installed natively, handling SSL via Cloudflare DNS or standard ACME)
- Storage: OS on boot volume, Airflow data (DAGs, logs, Postgres DB) on a separate Block Volume mounted at /mnt/blockvolume.
- Access: Airflow UI accessible via https://airflow.yourdomain.com
Prerequisites
Before you begin, ensure you have the following set up:
- Oracle Cloud ARM VM: An Ampere A1 instance running Ubuntu 24.04 LTS.
- SSH Access: You can connect to your VM as a user with sudo privileges.
- Block Volume: A block volume attached to your VM and mounted read/write (e.g., at /mnt/blockvolume). We’ll use this path throughout the guide; adjust if yours is different.
- Docker & Docker Compose: Installed and running correctly on the ARM VM. Follow the official Docker installation instructions for Ubuntu ARM64.
- Caddy v2: Installed natively (not via Docker) and configured to serve your main domain (e.g., yourdomain.com) with SSL, potentially using Cloudflare integration.
- DNS Record: An A or AAAA record for airflow.yourdomain.com pointing to your server’s public IP address (e.g., configured in Cloudflare).
- (Optional) Cloudflare: Configured for your domain (yourdomain.com).
Table of Contents
- Step 1: Prepare Directory Structure on Block Volume
- Step 2: Create the Docker Compose File
- Step 3: Initialize the Airflow Database
- Step 4: Configure Caddy Reverse Proxy
- Step 5: Start Airflow Services
- Step 6: Verify Installation
- Step 7: Security & Next Steps
- Conclusion
Step 1: Prepare Directory Structure on Block Volume
We need dedicated directories for Airflow’s persistent data on the block volume.
# Define base path (adjust if your volume mount point is different)
AIRFLOW_BASE_DIR="/mnt/blockvolume/airflow"
# Create directories
sudo mkdir -p ${AIRFLOW_BASE_DIR}/{dags,logs,plugins,config,postgres-data}
# Set initial ownership (adjust if your primary user isn't the default)
# Note: We will hardcode the UID/GID in docker-compose later for consistency
sudo chown -R $(id -u):$(id -g) ${AIRFLOW_BASE_DIR}
# Set appropriate permissions (allow group write for Docker flexibility)
sudo chmod -R 775 ${AIRFLOW_BASE_DIR}
Step 2: Create the Docker Compose File
Airflow provides an official docker-compose.yaml file. We’ll download it and customize it for our LocalExecutor setup (simpler for single-node), ARM compatibility, block volume storage, and hardcoded configuration for consistency.
- Navigate to the config directory:
cd /mnt/blockvolume/airflow/config
2. Download the official file (check Airflow docs for the latest recommended version):
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
3. Edit the Docker Compose File: Open the downloaded docker-compose.yaml with a text editor (sudo nano docker-compose.yaml). Replace the entire contents with the following configuration. This version uses LocalExecutor, points volumes to /mnt/blockvolume, removes Celery/Redis, binds the webserver to localhost, and hardcodes configuration values.
(Security Note: Hardcoding secrets like passwords and keys directly in this file is convenient for consistency but less secure than using a separate .env file, especially if this file might be shared or version-controlled. Ensure this file remains private on your server.)
# /mnt/blockvolume/airflow/config/docker-compose.yaml
# --- Airflow with LocalExecutor, data on block volume ---
# --- ALL Config values HARDCODED ---
---
x-airflow-common:
&airflow-common
# Official multi-arch image should work on ARM64. Update tag as needed.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.10.5}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: LocalExecutor
# Use a strong, unique password for Postgres below and here:
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:YOUR_SECURE_POSTGRES_PASSWORD@postgres/airflow
# Generate unique keys using commands below and replace placeholders:
# Fernet Key: python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Secret Key: openssl rand -hex 32
AIRFLOW__CORE__FERNET_KEY: 'YOUR_GENERATED_FERNET_KEY_HERE' # Hardcoded Fernet Key
AIRFLOW__WEBSERVER__SECRET_KEY: 'YOUR_GENERATED_WEBSERVER_SECRET_KEY_HERE' # Hardcoded Webserver Secret Key
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true' # Set to false to disable example DAGs
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
_PIP_ADDITIONAL_REQUIREMENTS: '' # Add python packages here if needed e.g. 'apache-airflow-providers-...'
volumes:
# Point to directories on the block volume
- /mnt/blockvolume/airflow/dags:/opt/airflow/dags
- /mnt/blockvolume/airflow/logs:/opt/airflow/logs
- /mnt/blockvolume/airflow/plugins:/opt/airflow/plugins
# Use the UID of your host user that owns the block volume directories
# Replace '1001' with your actual UID from `id -u` command on host
user: "1001:0" # Hardcoded UID:GID
depends_on:
&airflow-common-depends-on
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13 # Ensure compatibility with Airflow version
environment:
POSTGRES_USER: airflow
# Use the SAME strong password as in the SQL_ALCHEMY_CONN above
POSTGRES_PASSWORD: YOUR_SECURE_POSTGRES_PASSWORD
POSTGRES_DB: airflow
volumes:
# Persist Postgres data on the block volume
- /mnt/blockvolume/airflow/postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 10s
retries: 5
start_period: 5s
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
# Expose ONLY on localhost for Caddy proxy
- "127.0.0.1:8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
# Initialization service (runs once)
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
command:
- -c
- |
echo "Running init as User: $(id -u):$(id -g)"
mkdir -p /opt/airflow/logs /opt/airflow/dags /opt/airflow/plugins
# Ensure ownership matches the user directive in x-airflow-common
# Replace '1001' with your actual host UID if different
chown -R "1001:0" /opt/airflow/{logs,dags,plugins}
echo "Attempting to check DB connection..."
airflow db check || { echo "Database connection check failed."; exit 1; }
echo "DB connection check successful."
echo "Initializing the database..."
airflow db init || { echo "Database initialization failed."; exit 1; }
echo "Database initialization successful."
echo "Creating admin user..."
# Use '$$' to escape $ for Compose, so shell sees the variable
airflow users create --role Admin \
--username $${_AIRFLOW_WWW_USER_USERNAME} \
--password $${_AIRFLOW_WWW_USER_PASSWORD} \
--firstname Airflow --lastname Admin --email [email protected] || \
echo "Admin user already exists or creation failed."
echo "Initialization complete."
environment:
<<: *airflow-common-env
# Use a strong, unique password for the initial admin UI login
_AIRFLOW_WWW_USER_USERNAME: 'admin' # Or choose a different admin username
_AIRFLOW_WWW_USER_PASSWORD: 'YOUR_SECURE_ADMIN_UI_PASSWORD' # Hardcoded Admin Password
_PIP_ADDITIONAL_REQUIREMENTS: '' # Keep empty for init
user: "0:0" # Run init as root for permissions/DB setup
volumes:
# Mount volumes for ownership checks/setting during init
- /mnt/blockvolume/airflow/dags:/opt/airflow/dags
- /mnt/blockvolume/airflow/logs:/opt/airflow/logs
- /mnt/blockvolume/airflow/plugins:/opt/airflow/plugins
- Generate Keys/Passwords and Update Placeholders: Before saving the docker-compose.yaml file, you need to generate secure values and replace the corresponding placeholders within the file:
- YOUR_SECURE_POSTGRES_PASSWORD: Create a strong, unique password for the PostgreSQL database user (airflow). Use a password manager or a command like openssl rand -base64 32. Replace both occurrences in the file.
- YOUR_SECURE_ADMIN_UI_PASSWORD: Create a separate, strong password for the initial Airflow admin user login. Replace the placeholder in the airflow-init service’s environment section.
- YOUR_GENERATED_FERNET_KEY: The Fernet key encrypts sensitive connection information stored in Airflow’s database. Generate it on your server:
- Install prerequisites: sudo apt update && sudo apt install python3-pip -y && pip install cryptography
- Generate key: python3 -c “from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())”
- Copy the output (a long string) and paste it into the AIRFLOW__CORE__FERNET_KEY value in the compose file.
- YOUR_GENERATED_WEBSERVER_SECRET_KEY: This key secures user sessions in the Airflow UI. Generate it on your server:
- Generate key: openssl rand -hex 32
- Copy the output (a 64-character hexadecimal string) and paste it into the AIRFLOW__WEBSERVER__SECRET_KEY value in the compose file.
- User ID (user: “1001:0”): Confirm the user ID 1001 matches the output of the id -u command for your primary user on the host machine. Adjust if necessary.
- Save the file after replacing all placeholders with your generated values.
Step 3: Initialize the Airflow Database
This one-time command runs the airflow-init service defined above to set up the database schema and create the initial admin user.
# Ensure you are in the config directory cd /mnt/blockvolume/airflow/config # Run the initialization docker compose run --rm airflow-init
Watch the output for success messages regarding DB connection, initialization, and user creation.
Step 4: Configure Caddy Reverse Proxy
Edit your main Caddy configuration file (usually /etc/caddy/Caddyfile) and add a block for your Airflow subdomain.
sudo nano /etc/caddy/Caddyfile
Add this block (adjusting alongside your existing configurations):
# /etc/caddy/Caddyfile
# (Your existing site blocks, e.g., for yourdomain.com)
# yourdomain.com {
# ...
# }
# Add Airflow configuration
airflow.yourdomain.com {
# Proxy requests to the Airflow webserver container on localhost:8080
reverse_proxy 127.0.0.1:8080 {
# Pass essential headers to the backend
header_up Host {http.request.host}
header_up X-Real-IP {http.request.remote.ip}
header_up X-Forwarded-For {http.request.remote.ip}
header_up X-Forwarded-Proto {http.request.scheme}
}
# Enable compression
encode zstd gzip
# Caddy automatically handles HTTPS certificates
}
# (Other global options or site blocks)
Save the Caddyfile and reload Caddy to apply the changes:
sudo systemctl reload caddy sudo systemctl status caddy # Check for errors
Step 5: Start Airflow Services
Now, start the main Airflow webserver and scheduler containers.
# Ensure you are in the config directory cd /mnt/blockvolume/airflow/config # Start services in detached mode docker compose up -d
Step 6: Verify Installation
- Check Containers: Run docker ps to see if config-postgres-1, config-airflow-scheduler-1, and config-airflow-webserver-1 (or similar names) are Up and healthy.
- Check Logs (if needed): docker logs config-airflow-webserver-1 or docker logs config-airflow-scheduler-1.
- Access UI: Open your browser and navigate to https://airflow.yourdomain.com.
- Login: Use the admin username (default admin or what you set) and the YOUR_SECURE_ADMIN_UI_PASSWORD you configured in docker-compose.yaml.
If you see the Airflow dashboard, congratulations!
Step 7: Security & Next Steps
- Admin Password: Although you set an initial password, consider changing it via the Airflow UI (Profile -> User Info -> Reset Password) if desired, especially if you revert to using .env files later.
- Fernet Key: The key ensures connection passwords stored in Airflow are encrypted. Keep it safe.
- Secrets Management: For production, explore more robust secrets management backends (like HashiCorp Vault) instead of hardcoding or using .env files.
- DAGs: Place your DAG .py files in /mnt/blockvolume/airflow/dags on the host server. Airflow will automatically detect them.
- Monitoring: Consider tools like Netdata or Prometheus/Grafana to monitor your VM and Airflow container resources.
- Upgrades: Consult the official Airflow documentation for upgrade procedures, which typically involve updating the image tag in docker-compose.yaml, running database migrations, and restarting services.
Conclusion
You now have a working Apache Airflow instance running securely on its own subdomain on your Oracle Cloud ARM server. By leveraging Docker Compose, Caddy, and a dedicated block volume, you have a scalable and maintainable setup ready for your data pipelines. Happy DAG running!