This repo contains the minimum steps to create an e2e time series forecast pipeline with KubeFlow and TFX, deployed at Google Cloud Platform
The chicago taxi rides dataset was used throughout this tutorial.
Steps covered in this tutorial in the suggested order:
- How to create and deploy a Kubeflow cluster at GCP (
kubeflow_cluster
) - Transform data from BigQuery at scale with Tensorflow Transform (
code/preproc
) - Check for data anomalies and skewness with Tensorflow Data Validation (
code/data_validation
) - Train model at K8s cluster (
code/train
) - Deploy model at Google Cloud Machine Learning Engine (
code/deploy
) - Measure model performance (
code/evaluate
) - Build and run kubeflow pipeline (
code/pipeline
)
If you have questions, please contact:
- Diego Silva - diegosilva@ciandt.com
- Gabriel Moreira - gabrielpm@ciandt.com
- Leandro Vendramin - vendramin@ciandt.com
- Lu Tung Huang - ltung@ciandt.com
- Pedro Lelis - pedrolelis@ciandt.com
- Rodrigo Pereira - rodrigofp@ciandt.com