See this blog article: https://medium.com/a-r-g-o/installing-apache-airflow-on-ubuntu-aws-6ebac15db211
$sudo apt-get install python-pip3 pip3 install --upgrade pip
$sudo -u postgres psql
Now that we are on postgres as the postgres user, run the following commands:
CREATE ROLE airflow;
create database airflow;
GRANT ALL PRIVILEGES on database airflow to airflow;
ALTER ROLE airflow SUPERUSER;
ALTER ROLE airflow CREATEDB;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
ALTER USER airflow WITH PASSWORD 'datasf_airflow';
ALTER ROLE airflow WITH LOGIN;
ALTER USER 'airflow' WITH PASSWORD 'some password';
You will need to find the location of the pg_hba.conf file (it's likely in /etc/postgresql/9.*/main/). Open the file with a text editor (vi, emacs or nano), and change the ipv4 address to 0.0.0.0/0 and the ipv4 connection method from md5 (password) to trust if you don't want to use a password to connect to the database. In the meantime, we also need to configure the postgresql.conf file to open the listen address to all ip addresses:
listen_addresses = '*'.
And we need to start a postgresql service
$sudo service postgresql start
And any time we modify the connection information, we need to reload the postgresql service for the modification to be recognized by the service:
$sudo service postgresql reload
$sudo apt-get install libmysqlclient-dev (dependency for airflow[mysql] package)
$sudo apt-get install libssl-dev (dependency for airflow[cryptograph] package)
$sudo apt-get install libkrb5-dev (dependency for airflow[kerbero] package)
$sudo apt-get install libsasl2-dev (dependency for airflow[hive] package):`
Add airflow user
$sudo useradd airflow
$sudo passwd airflow
export the airflow home dir
$export AIRFLOW_HOME=/home/airflow/airflow
$sudo pip2 install "airflow[async, devel, celery, crypto,password, postgres, qds, rabbitmq, slack]"
sql_alchemy_conn = postgresql+psycopg2://airflow:somepass@127.0.0.1:5432/airflow from sqlalchemy import create_engine engine = create_engine('postgresql+psycopg2://airflow:somepass@127.0.0.1:5432/airflow')
make sure the pid files are removed
$ rm airflow-scheduler.*
$ rm airflow-webserver.pid
Or check the processes:
ps -ef | grep airflow | awk '{print $2}' | xargs kill -9
airflow resetdb
You will need to create user to log into the airflow web UI ==> you will do this from python cmd line interface
run $ python3
That will bring you into the python interface
>>> import airflow from airflow import models, settings from airflow.contrib.auth.backends.password_auth import PasswordUser user = PasswordUser(models.User()) user.username = 'datasf' user.email = 'datasf_admin@datasf.org' user.password = 'somepass' session = settings.Session() session.add(user) session.commit() session.close() exit()
airflow scheduler -D
airflow webserver -D -p 8080
Make a pg user for backups
ALTER USER backup_admin WITH PASSWORD 'some password'
this will call on this user in a backup script