Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow's extensible Python framework enables you to build workflows connecting with virtually any technology. You can easily visualize your data pipeline's dependencies, progress, logs, code, trigger tasks, and success status. With Airflow, users can author workflows as Directed Acyclic Graphs (DAGs) of tasks. Let's Setup Apache Airflow on your machine in 5 minutes or less. This installation guide is for Mac users.
Pre-requisites
Before you get started, make sure you have Docker and Python 3.7+ installed on your machine. Workflows in Airflow are defined as Python code which means one must be familiar with Python.
Step #1: Install Apache Airflow
Create and Navigate to the directory where you want to setup Airflow.
mkdir airflow
Note - Airflow installation can be tricky sometimes because Airflow is both a library and an application. This means that from time to time plain pip will not work or will produce an unusable Airflow installation.
pip install apache-airflow
To install airflow use this command and specify apache version.
pip install "apache-airflow[celery]==2.5.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txt"
Step #2: Verify if the installation has been completed
All you do is type this command:
airflow version
The output should look something like this:
Step #3 - Let's get started
Create metadata storage directories using this command.
mkdir -p ~/airflow
Setup Airflow User.
airflow users create \
--username airflow \
--password airflow \
--firstname yourFirstName \
--lastname yourLastName \
--role Admin \
--email airflow@example.com
To run apache airflow use this docker command
// This will run all necessary docker images
docker-compose up
Step #4 - Access the Airflow UI and start managing your DAGS
Open any browser and go to http://localhost:8080/. Port 8080 should be the default port for Airflow.
After logging in using our airflow username and password, we should see the webserver UI of airflow.
That's it - you're already up and running with Apache Airflow.
Conclusion
Hope you enjoyed this article. This is a series of 5-minute articles for anyone looking to quickly get set up on the tools of the modern data stack.
Comments