Let's design a real-time ML system with incremental re-training ⚡

The problem

ML models are pattern finding machines, that try to capture the relationship between

a set of inputs available at prediction time (aka features), and
a metric you want to predict (aka target)

For most real-world problems these patterns between the features and the target are not static, but change over time. So, if you don’t re-train your ML models, their accuracy degrades over time. This is commonly known as concept drift.

Now, the speed at which patterns change, and you model degrades, depends on the particular phenomena you are modelling.

For example 💁
If you are trying to predict rainfall, re-training your ML model daily is good enough. Rainfall patterns obey the laws of physics, and these do not change too much from one day to the next.

On the other hand, if you are trying to predict short-term crypto prices, where patterns between

available market data (aka features), and
future asset prices (aka target)

are short-lived, you must re-train your ML model very frequently. Ideally, in real-time.

A similar situation happens when you want to build a real-time recommender system, like Tiktok’s famous monolith, where user preferences change in the blink of an eye, and your ML models needs to be refreshed as often as possible.

So now the question is

How do you design an ML system that continuously re-trains the ML model that is serving the predictions ❓

In this repo you can find a source code implementation.

Run the whole thing in 5 minutes

Install all project dependencies inside an isolated virtual env, using Python Poetry
```
$ make install
```
Start the feature pipelines with
```
$ make producers
```
Start the training pipeline with
```
$ make training
```
Start the inference pipeline
```
$ make predict
```

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
media		media
models		models
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
redpanda.yml		redpanda.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Let's design a real-time ML system with incremental re-training ⚡

Table of contents

The problem

Run the whole thing in 5 minutes

Wanna learn more real-world ML?

👉🏽 Subscribe for FREE

About

Releases

Packages

Languages

Paulescu/incremental-ml-training-and-serving

Folders and files

Latest commit

History

Repository files navigation

Let's design a real-time ML system with incremental re-training ⚡

Table of contents

The problem

Run the whole thing in 5 minutes

Wanna learn more real-world ML?

👉🏽 Subscribe for FREE

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages