Add Mlflow Container example commands #135
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
docker compose changes
A new service has been added to
docker-compose.yml
. The original intent was to use the off-the-shelf mlflow docker image, but that was missing some key dependencies , so a small dockerfile has been added which installs additional python packages for our use case.The service spins up an instance of Mlfow, which is configured to use postgres as a backend store and minio as the asset store. It is available at http://localhost:5000.
Some additional environment variables and django settings have been added as well to facilitate communication between all the services.
new commands
In order to start working with Mlflow, a few new django commands have been added.
./manage.py setupmlflow
This command uses
psycopg2
to create a new postgres database calledmlflow
, which the Mlflow server uses as a backend store. I have not thoroughly investigated use of the django ORM to help with this. Maybe there's a better way to integrate the backend store with the existingdjango
database. This way we could usedjango
models to manage mlflow objects as well. I'll leave that to further discussion/follow up.It also creates a bucket in minio that Mlflow can use as an artifact store.
./manage.py makeexperiment
Mlflow groups related runs into "experiments." This command allows you to create a named experiment to group training runs. It seemed a useful thing to have, and a good starting point for working with the Mlflow API. It doesn't have to stick around.
./manage.py examplelog
This allows users to run a toy ML training run grouped with an optional experiment. If no experiment is provided, if uses the "default" experiment, which Mlflow creates automatically. For now, this is pretty much a copy/paste of a random forest regressor training session.
The actual ML happens in a celery task, and uploads some artifacts to Mlflow, which for now can only be viewed by accessing the Mlflow instance at http://localhost:5000.