Skip to content

Commit

Permalink
Optimize filtering univariate result for period
Browse files Browse the repository at this point in the history
The `PerMetricPerColumnResult` filter function has a high overhead for
selecting a subset of columns or metrics. This overhead is also incurred
(and highest) when only filtering for period, as then all columns &
metrics will be selected.

This commit adds a short-circuit path to avoid the overhead when only
the period requires filtering. For a result with 50 columns and 8
metrics this results in a >100x speed-up when only filtering for period.
  • Loading branch information
michael-nml committed May 24, 2024
1 parent da33807 commit 246ca35
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions nannyml/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,13 +293,15 @@ def _filter(
*args,
**kwargs,
) -> Self:
res = super()._filter(period, *args, **kwargs)
if metrics is None and column_names is None:
return res

if metrics is None:
metrics = [metric.column_name for metric in self.metrics]
if column_names is None:
column_names = self.column_names

res = super()._filter(period, *args, **kwargs)

data = pd.concat([res.data.loc[:, (['chunk'])], res.data.loc[:, (column_names, metrics)]], axis=1)
data = data.reset_index(drop=True)

Expand Down

0 comments on commit 246ca35

Please sign in to comment.