Basic example of t-test on a polars dataframe #98

rosmur · 2024-11-18T06:48:47Z

rosmur
Nov 18, 2024

Hello, I appreciate that you understand the importance of good documentation for tool adoption. Sorry if I am missing it, but is there are any basic examples of running a Student's t-test using a polars dataframe?

Answered by e10v

Dec 14, 2024

Since version 0.3.0 it's possible to pass Polars dataframes to tea-tasting directly:

import polars as pl
import tea_tasting as tt


data = pl.from_pandas(tt.make_users_data(seed=42))
print(data)
#> ┌──────┬─────────┬──────────┬────────┬───────────┐
#> │ user ┆ variant ┆ sessions ┆ orders ┆ revenue   │
#> │ ---  ┆ ---     ┆ ---      ┆ ---    ┆ ---       │
#> │ i64  ┆ u8      ┆ i64      ┆ i64    ┆ f64       │
#> ╞══════╪═════════╪══════════╪════════╪═══════════╡
#> │ 0    ┆ 1       ┆ 2        ┆ 1      ┆ 9.166147  │
#> │ 1    ┆ 0       ┆ 2        ┆ 1      ┆ 6.434079  │
#> │ 2    ┆ 1       ┆ 2        ┆ 1      ┆ 7.943873  │
#> │ 3    ┆ 1       ┆ 2        ┆ 1      ┆ 15.928675 │
#> │ 4    ┆ 0   …

View full answer

e10v · 2024-11-18T18:10:42Z

e10v
Nov 18, 2024
Maintainer

Hello,

Thank you for your interest and for a good question.

The easiest way is to create an Ibis Table from Polars DataFrame and then use it as input data. Example:

import ibis
import polars as pl
import tea_tasting as tt


data = ibis.memtable(pl.from_pandas(tt.make_users_data(seed=42)))

experiment = tt.Experiment(
    sessions_per_user=tt.Mean("sessions"),
    orders_per_session=tt.RatioOfMeans("orders", "sessions"),
    orders_per_user=tt.Mean("orders"),
    revenue_per_user=tt.Mean("revenue"),
)

result = experiment.analyze(data)
print(result)

P.S. First, I wanted to suggest using Polars as Ibis data backend:

con = ibis.polars.connect({"users_data": pl.from_pandas(tt.make_users_data(seed=42))})
data = con.table("users_data")

But it turned out that Ibis doesn't support window functions in Polars backend (tea-tasting uses them to calculate variance and covariance). I will have to work on that in future versions.

2 replies

rosmur Nov 19, 2024
Author

This is still generating some random data. Can we please get a basic example using an actual tabular dataset (with the data viewable)? I don't know what orders, revenue etc is if I don't know what the data set is.

e10v Nov 19, 2024
Maintainer

Sorry, I don't understand what difference it makes in the context of your question about using data in a specific format. Here is the explanation of the columns in the dataset: https://tea-tasting.e10v.me/user-guide/#input-data

The data are viewable. If you want to see them as a Polars dataframe, here is the example:

import ibis
import polars as pl
import tea_tasting as tt


data_pl = pl.from_pandas(tt.make_users_data(seed=42))
print(data_pl)
#> ┌──────┬─────────┬──────────┬────────┬───────────┐
#> │ user ┆ variant ┆ sessions ┆ orders ┆ revenue   │
#> │ ---  ┆ ---     ┆ ---      ┆ ---    ┆ ---       │
#> │ i64  ┆ u8      ┆ i64      ┆ i64    ┆ f64       │
#> ╞══════╪═════════╪══════════╪════════╪═══════════╡
#> │ 0    ┆ 1       ┆ 2        ┆ 1      ┆ 9.166147  │
#> │ 1    ┆ 0       ┆ 2        ┆ 1      ┆ 6.434079  │
#> │ 2    ┆ 1       ┆ 2        ┆ 1      ┆ 7.943873  │
#> │ 3    ┆ 1       ┆ 2        ┆ 1      ┆ 15.928675 │
#> │ 4    ┆ 0       ┆ 1        ┆ 1      ┆ 7.136917  │
#> │ …    ┆ …       ┆ …        ┆ …      ┆ …         │
#> │ 3995 ┆ 0       ┆ 2        ┆ 0      ┆ 0.0       │
#> │ 3996 ┆ 0       ┆ 2        ┆ 0      ┆ 0.0       │
#> │ 3997 ┆ 0       ┆ 3        ┆ 0      ┆ 0.0       │
#> │ 3998 ┆ 0       ┆ 1        ┆ 0      ┆ 0.0       │
#> │ 3999 ┆ 0       ┆ 5        ┆ 2      ┆ 17.162459 │
#> └──────┴─────────┴──────────┴────────┴───────────┘
data = ibis.memtable(data_pl)

Or you can view an Ibis Table in interactive mode:

ibis.options.interactive = True
print(data)
#> ┏━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓
#> ┃ user  ┃ variant ┃ sessions ┃ orders ┃ revenue   ┃
#> ┡━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩
#> │ int64 │ uint8   │ int64    │ int64  │ float64   │
#> ├───────┼─────────┼──────────┼────────┼───────────┤
#> │     0 │       1 │        2 │      1 │  9.166147 │
#> │     1 │       0 │        2 │      1 │  6.434079 │
#> │     2 │       1 │        2 │      1 │  7.943873 │
#> │     3 │       1 │        2 │      1 │ 15.928675 │
#> │     4 │       0 │        1 │      1 │  7.136917 │
#> │     5 │       1 │        2 │      0 │  0.000000 │
#> │     6 │       1 │        1 │      0 │  0.000000 │
#> │     7 │       1 │        2 │      0 │  0.000000 │
#> │     8 │       0 │        2 │      0 │  0.000000 │
#> │     9 │       0 │        1 │      0 │  0.000000 │
#> │     … │       … │        … │      … │         … │
#> └───────┴─────────┴──────────┴────────┴───────────┘

e10v · 2024-12-14T20:43:46Z

e10v
Dec 14, 2024
Maintainer

Since version 0.3.0 it's possible to pass Polars dataframes to tea-tasting directly:

import polars as pl
import tea_tasting as tt


data = pl.from_pandas(tt.make_users_data(seed=42))
print(data)
#> ┌──────┬─────────┬──────────┬────────┬───────────┐
#> │ user ┆ variant ┆ sessions ┆ orders ┆ revenue   │
#> │ ---  ┆ ---     ┆ ---      ┆ ---    ┆ ---       │
#> │ i64  ┆ u8      ┆ i64      ┆ i64    ┆ f64       │
#> ╞══════╪═════════╪══════════╪════════╪═══════════╡
#> │ 0    ┆ 1       ┆ 2        ┆ 1      ┆ 9.166147  │
#> │ 1    ┆ 0       ┆ 2        ┆ 1      ┆ 6.434079  │
#> │ 2    ┆ 1       ┆ 2        ┆ 1      ┆ 7.943873  │
#> │ 3    ┆ 1       ┆ 2        ┆ 1      ┆ 15.928675 │
#> │ 4    ┆ 0       ┆ 1        ┆ 1      ┆ 7.136917  │
#> │ …    ┆ …       ┆ …        ┆ …      ┆ …         │
#> │ 3995 ┆ 0       ┆ 2        ┆ 0      ┆ 0.0       │
#> │ 3996 ┆ 0       ┆ 2        ┆ 0      ┆ 0.0       │
#> │ 3997 ┆ 0       ┆ 3        ┆ 0      ┆ 0.0       │
#> │ 3998 ┆ 0       ┆ 1        ┆ 0      ┆ 0.0       │
#> │ 3999 ┆ 0       ┆ 5        ┆ 2      ┆ 17.162459 │
#> └──────┴─────────┴──────────┴────────┴───────────┘

experiment = tt.Experiment(
    sessions_per_user=tt.Mean("sessions"),
    orders_per_session=tt.RatioOfMeans("orders", "sessions"),
    orders_per_user=tt.Mean("orders"),
    revenue_per_user=tt.Mean("revenue"),
)

result = experiment.analyze(data)
print(result)
#>             metric control treatment rel_effect_size rel_effect_size_ci pvalue
#>  sessions_per_user    2.00      1.98          -0.66%      [-3.7%, 2.5%]  0.674
#> orders_per_session   0.266     0.289            8.8%      [-0.89%, 19%] 0.0762
#>    orders_per_user   0.530     0.573            8.0%       [-2.0%, 19%]  0.118
#>   revenue_per_user    5.24      5.73            9.3%       [-2.4%, 22%]  0.123

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic example of t-test on a polars dataframe #98

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Basic example of t-test on a polars dataframe #98

rosmur Nov 18, 2024

Replies: 2 comments · 2 replies

e10v Nov 18, 2024 Maintainer

rosmur Nov 19, 2024 Author

e10v Nov 19, 2024 Maintainer

e10v Dec 14, 2024 Maintainer

rosmur
Nov 18, 2024

Replies: 2 comments 2 replies

e10v
Nov 18, 2024
Maintainer

rosmur Nov 19, 2024
Author

e10v Nov 19, 2024
Maintainer

e10v
Dec 14, 2024
Maintainer