├── docker <- folder with all the useful Dockerfiles
│
├── flask <- Flask API used to expose the model
│
├── models <- Trained and serialized models
│
├── src <- Source code for use in this project.
│
├── streamlit <- Streamlit user interface
│
├── Makefile <- Makefile with commands like `make train`
│
├── README.md <- The top-level README for developers using this project.
│
├── requirements.txt <- The requirements file
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
Project based on the cookiecutter data science project template. #cookiecutterdatascience
pip install -r requirements.txt
The variables used in this README
are all defined in the .env
file.
These variables are automatically imported in the Makefile but if in the following parts you want to run any of the command manually you have to run either of these command lines
make init_env_variables
or with the full command
export $(grep -v '^#' .env | xargs)
In both cases, you also need to store your local IP adress.
LOCAL_IP_ADRESS=$(ipconfig getifaddr en0)
This needs to be run every time you start a new terminal.
The data used in this project comes from a Kaggle dataset Anime Recommendation Database 2020.
For the training of this app you will need anime.csv
and rating_complete.csv
to be stored in the ./data/
folder.
make train
or with the full command
python src/training.py \
--data_folder $(PWD)/data \
--model_path $(PWD)/models/trained_model.pkl \
--nb_hidden_features $NB_FEATURES
This will take a couple of minutes depending on your own computer configuration but also on the number of features used.
make flask_run
or with the full command
python flask/app.py \
--model_path=$(PWD)/models/trained_model.pkl \
--debug=False \
--host_ip=$LOCAL_IP_ADRESS \
--port=$FLASK_PORT
The server should be ready to use in a few seconds.
make predict_shingeki_no_kyojin
or with the full command
curl -X POST http://$LOCAL_IP_ADRESS:$FLASK_PORT/predict \
-d @flask/samples/shingeki_no_kyojin.json \
-H "Content-Type: application/json"
With this you should get a recommendation based on Shingeki no Kyojin (Attack on Titans) ratings.
make streamlit_run
or with the full command
streamlit run streamlit/app.py \
--browser.serverAddress $LOCAL_IP_ADRESS \
--server.port $STREAMLIT_PORT \
-- \
--anime_path $(PWD)/data/anime.csv \
--model_ip $LOCAL_IP_ADRESS \
--model_port $FLASK_PORT
If no tab started on your brower you can run this command and paste the result in your favorite browser.
echo http://$STREAMLIT_IP:$STREAMLIT_PORT
make docker_build_training
or with the full command
docker build \
-t training:v0 \
-f docker/training/Dockerfile .
make docker_run_training
or with the full command
docker run \
-it --rm \
-v $(PWD)/data:/anime/data \
-v $(PWD)/models:/anime/models \
training:v0 \
--data_folder /anime/data \
--model_path /models/trained_model.pkl \
--nb_hidden_features $NB_FEATURES
Note: I am having huge performance issues when running the training on a docker so unless you run it on the cloud you may have to train the model without docker (see 3.)
make docker_build_flask
or with the full command
docker build \
-t flask:v0 \
-f docker/flask/Dockerfile .
make docker_run_flask
or with the full command
docker run \
-p $FLASK_PORT:$FLASK_PORT \
-v $(PWD)/models:/anime/models \
flask:v0 \
--model_path=./models/trained_model.pkl \
--debug False \
--host_ip $LOCAL_IP_ADRESS \
--port $FLASK_PORT
Note: Same as for the training, I am having performance issues when running the training on a docker locally so I would advise you to run flask without docker (see 4.).
make docker_build_streamlit
or with the full command
docker build \
-t streamlit:v0 \
-f docker/streamlit/Dockerfile .
make docker_run_streamlit
or with the full command
docker run \
-p $STREAMLIT_PORT:$STREAMLIT_PORT \
-v $(PWD)/data:/anime/data \
streamlit:v0 \
--browser.serverAddress $LOCAL_IP_ADRESS \
--server.port $STREAMLIT_PORT \
-- \
--anime_path ./data/anime.csv \
--model_ip $LOCAL_IP_ADRESS \
--model_port $FLASK_PORT
For the streamlit app the performances are not harmed at all by the use of docker.
If I were to continue this project, I would:
- Find a way to make the docker containers more efficient
- Add command lines to train and expose the model/app on the cloud
- Test Poetry to handle requirements
- Bonus: Refacto the code