Optimize filtering univariate result for period

The `PerMetricPerColumnResult` filter function has a high overhead for selecting a subset of columns or metrics. This overhead is also incurred (and highest) when only filtering for period, as then all columns & metrics will be selected. This commit adds a short-circuit path to avoid the overhead when only the period requires filtering. For a result with 50 columns and 8 metrics this results in a >100x speed-up when only filtering for period.
NannyML · May 24, 2024 · 246ca35 · 246ca35
1 parent da33807
commit 246ca35
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/nannyml/base.py b/nannyml/base.py
@@ -293,13 +293,15 @@ def _filter(
         *args,
         **kwargs,
     ) -> Self:
+        res = super()._filter(period, *args, **kwargs)
+        if metrics is None and column_names is None:
+            return res
+
         if metrics is None:
             metrics = [metric.column_name for metric in self.metrics]
         if column_names is None:
             column_names = self.column_names
 
-        res = super()._filter(period, *args, **kwargs)
-
         data = pd.concat([res.data.loc[:, (['chunk'])], res.data.loc[:, (column_names, metrics)]], axis=1)
         data = data.reset_index(drop=True)