About matrix factorization on small data #1442

ananiask8 · 2023-11-08T13:29:41Z

ananiask8
Nov 8, 2023

Overview

First, I wanted to ask about your experience in the small data regime when using River (e.g. each user has, on average, between 2 and 6 item ratings given, but each item has, on average, around 100 ratings). I'm concerned that doing only online might result in suboptimal representations; what do you think about having a full-fledged offline training with several epochs in order to get the most out of the available data, then deploying the model and learning in online mode? If both, users and items, could be expected to have about 50 ratings, then I think the issue would be mitigated, but I cannot expect that in my use-case — please share your thoughts about this.

Specific Problem

Assuming that the ratings are like/dislike, so 1/0, I can look at the prediction as a "probability of liking." At inference time, at time t, I have an item for which N users have liked and M users have disliked, and based on this I want to predict the probability of liking for the current user.

Current Approach

At the moment I am using River to learn user representations offline: I train with the data from one week ago until today (a few epochs), and today I am evaluating. Then, I do (sort-of) MLE (using scipy's optimize with SLSQP) to infer an item representation that maximizes the probability of liking for users which liked, and minimizes it for users who disliked, then I use this representation as item representation, to perform the probability of liking for the current user.

Potential Approach

I would like to use River out-of-the-box as much as possible. The first idea is to keep the offline training part, but replace the MLE part with River's online learning. The second idea is to replace the offline training part and do everything with River. I would appreciate your thoughts regarding if you think River can be used for the whole thing, given the concerns I have raised.

MaxHalford · 2023-11-08T16:37:34Z

MaxHalford
Nov 8, 2023
Maintainer

Hey there @ananiask8. Thanks for opening this issue, this is a good topic. Having lightweight per-user models that work with little amounts of feedback is a nice goal to reach.

I personally have limited experience applying these models. I/we implemented these models, without necessarily having put them in production. So you probably know more than we do here. My gut feeling though, is that there should features shared by users (e.g. age, gender, location, etc). These would allow you to diminish the cold start problem when facing new users.

I would like to use River out-of-the-box as much as possible. The first idea is to keep the offline training part, but replace the MLE part with River's online learning. The second idea is to replace the offline training part and do everything with River. I would appreciate your thoughts regarding if you think River can be used for the whole thing, given the concerns I have raised.

What is missing in River that is preventing you from doing this? What would like to see added to River?

I think it's sensible to warm-up a River offline, and then deploy. This basically boils down to calling evaluate.progressive_val_score several times (once per epoch) because that function is stateful. How do you do the offline training part?

3 replies

ananiask8 Nov 8, 2023
Author

there should features shared by users (e.g. age, gender, location, etc)

There are also cases where these are not available. Despite that, I'm thinking the several-epochs warm-up could help extract more meaningful features in the small-data regime: it is more informative for each representation because it's like they are converging together, so if we have in one of these items users with more data but also users with less data, then the features of these users with more data can have a positive effect in the users with less data. I'm thinking along these lines. So I feel the offline training makes sense from that perspective, and I am doing it as you mentioned, calling several times the evaluate.progressive_val_score with shuffled samples.

My remaining main question is regarding the inference time approach, since solving the optimization task as I am doing it now is pretty slow for running experiments. Given the warmed-up model representations, at inference time in my use-case I expect mainly new items and would ignore new users (until in the next warm-up session they are seen during the offline phase); given this, I wonder if giving the items a much higher learning rate than the users makes sense, since users would be more like a ground truth, while items would be mostly new. What are your thoughts about this? Like, imagine the extreme case where I have only old users and only new items — any ideas about how to approach this at inference time?

MaxHalford Nov 8, 2023
Maintainer

Sorry I can't give you a good answer regarding your main question, I don't have enough experience. It sound like though you need to do some benchmarking :)

ananiask8 Nov 8, 2023
Author

Sounds like a plan. Thanks for the quick responses!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About matrix factorization on small data #1442

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

About matrix factorization on small data #1442

ananiask8 Nov 8, 2023

Overview

Specific Problem

Current Approach

Potential Approach

Replies: 1 comment · 3 replies

MaxHalford Nov 8, 2023 Maintainer

ananiask8 Nov 8, 2023 Author

MaxHalford Nov 8, 2023 Maintainer

ananiask8 Nov 8, 2023 Author

ananiask8
Nov 8, 2023

Replies: 1 comment 3 replies

MaxHalford
Nov 8, 2023
Maintainer

ananiask8 Nov 8, 2023
Author

MaxHalford Nov 8, 2023
Maintainer

ananiask8 Nov 8, 2023
Author