TrackHub Registry (THR) is a global centralised collection of publicly accessible track hubs. The goal of the project is to allow third parties to advertise track hubs, and to make it easier for researchers around the world to discover and use track hubs containing different types of genomic research data.
This repository is created to give the project a technology refresh while keeping the same core functionalities.
- Python 3.7+
- venv
- Elasticsearch 6.3
- Docker and Docker-compose are required if you want to use docker containers
Clone the project
git clone https://github.com/Ensembl/thr.git
cd thr
You can run the whole application (Frontend + Backend) using docker-compose:
Uncomment the last line in Dockerfile
ENTRYPOINT ["/usr/src/app/entrypoint.sh"]
Then run
docker-compose -f docker-compose-local.yml up
If it's executed for the first time it will take some time (~10min) to download the necessary images and setup the environment.
- The app will be accessible at: http://127.0.0.1
- And Elasticsearch at: http://127.0.0.1:9200
To stop the docker use:
docker-compose -f docker-compose-local.yml stop
The
docker-compose stop
command will stop your containers, but it won't remove them.
Get the container ID (while docker-compose up
is running)
docker container ls
You'll get somthing like this:
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fd8d0eed8c88 thr_nginx "/docker-entrypoint.…" 10 minutes ago Up 2 minutes 0.0.0.0:80->80/tcp, :::80->80/tcp thr_nginx_1
378be0cb1e26 thr_react "docker-entrypoint.s…" 10 minutes ago Up 2 minutes 3000/tcp thr_react_1
c41390a512be thr_django "/usr/src/app/entryp…" 10 minutes ago Up 2 minutes 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp thr_django_1
8b1bc7c4aa46 mysql:5.7 "docker-entrypoint.s…" 10 minutes ago Up 2 minutes 33060/tcp, 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp thr_mysql_1
87b1ca81fa08 docker.elastic.co/elasticsearch/elasticsearch:6.3.0 "/usr/local/bin/dock…" 5 days ago Up 2 minutes 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp elasticsearch
Then run the command
docker exec -it <thr_django_container_id> python manage.py search_index --rebuild -f
--rebuild
will delete the index if it exists.
You can restart a specific container by typing:
docker-compose restart elasticsearch
whereelasticsearch
is the container name
If it's your first time importing genome assembly information/dump from ENA
use
docker exec -it <thr_django_container_id> python manage.py import_assemblies --fetch ena
It will fetch assembly info from ENA and dump it in a JSON file (then you can use the command below to populate the database)
An example of how the output will look like:
docker exec -it 9fe9ad66132a python manage.py import_assemblies --fetch ena
[ENA] Fetching assemblies from ENA, it may take few minutes...
[ENA] 1075977 Objects are fetched successfully (took 525.88 seconds)
If the JSON is already there, you can simply run
docker exec -it 9fe9ad66132a python manage.py import_assemblies
that will use the JSON file (located in ./assemblies_dump
directory) and loads it to MySQL table
An example of how the output will look like:
docker exec -it 9fe9ad66132a python manage.py import_assemblies
All rows are deleted! Please wait, the import process will take a while
[ENA] 1075977 objects imported successfully to MySQL DB (took 1539.78 seconds)!
The script deletes all the previously imported assemblies are re-populate the new JSON file content.
This doesn't delete the hubs, nor the info related to them, the assemblies info imported from JSON are stored in a separate table which is used to populate hub-related tables when submitting a new hub
docker exec -it <thr_django_container_id> python manage.py createsuperuser
To removes stopped service containers and anonymous volumes attached use:
docker-compose rm -v
If we need to rebuild the images again we can use the command:
docker-compose up --build
The docker-compose down command will stop the containers, but it also removes the stopped containers as well as any networks that were created. We can take down 1 step further and add the -v flag to remove all volumes too:
docker-compose down -v
To take look of what's inside a specific container:
docker exec -it <container_id> sh
Create, activate the virtual environment and install the required packages
python -m venv thr_env
source thr_env/bin/activate
pip install -r requirements.txt
Export the DB Configuration and turn on Debugging if necessary
# MySQL
export DB_DATABASE=thr_db # The DB should already be created
export DB_USER=user
export DB_PASSWORD=password
export DB_HOST=localhost
export DB_PORT=3306
# Elasticsearch
export ES_HOST=localhost:9200
Download and run Elasticsearch (follow the installation steps on Elasticsearch website)
Make migrations, migrate and rebuild ES index
python manage.py makemigrations
python manage.py migrate
python manage.py search_index --rebuild -f
The last command will create an index called trackhubs
, we can get the list of indices using the command
curl -XGET "http://localhost:9200/_cat/indices"
If it's your first time importing genome assembly information/dump from ENA
use
python manage.py import_assemblies --fetch ena
It will fetch assembly info from ENA and dump it in a JSON file then use it to populate the database
If the JSON is already there, you can simply run
python manage.py import_assemblies
that will use the JSON file (located in ./assemblies_dump
directory) and loads it to MySQL table
python manage.py runserver
The app will be accessible at: http://127.0.0.1:8000
python manage.py createsuperuser
In case we want to rebuild the ES index with data existing in MySQL, you need to run the following command:
- Rebuild ES index
python manage.py search_index --rebuild
- Enrich ES docs
python manage.py enrich all
This will rebuild the ES index and extract configuration
, data
, file type
and status
objects from MySQL DB and store it back in Elasticsearch by updating the documents.
You can enrich one specific trackdb (e.g.
python manage.py enrich 1
) or exclude a trackdb (e.g.python manage.py enrich all --exclude 1
). Seepython manage.py enrich -h
for more details