-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: .mutate() on wide tables is slow #10956
Comments
Code to reproduce below (this example is slightly simplified vs my real-life code, but still takes 2.5s vs the 5s mine takes)
To profile (in a notebook/ipython session):
|
after a little more experimenting its clear that the number of columns in the table on which .mutate is called is a key driver if I make the table 1000 columns wide (instead of 100) and reduce the call to mutate to something extremely trivial, the below takes over 10s to run
|
I will try to, though probably just next week. |
I'm trying to trace through the cause for an expression (used for an ETL process) that takes around 5 seconds to construct.
The expression involves building about 100 subtables from a main table (which has ~100 columns), each subtable contains 4-10 columns. The 100 subtables are then concatenated via a union operation
Constructing the 100 subtables takes almost all the expression building time. Each subtable involves a mutate on the main table, and then a 2nd mutate, so there are ~200 calls to Table.mutate made in building the 100 subtables. Key pain points within this process are all within ibis expression building code, in particular
DerefMap.from_targets
andDerefMap.dereference
which are used internally byExpr.bind
andTable.select
, in turn used byTable.mutate
Are there any known places within these functions where optimisations might be possible (memoisation / different data structure for faster lookup/search etc)?
I will try to come up with a reproducible example and follow up separately
Results from line profiling the ibis internal functions when building the expression are provided below. Note that using lprof introduces overhead so what took 5s without it takes 15s with it. Times below are all in seconds
I'm using latest release of ibis 10.2.0
Thanks!
Table.mutate
Expr.bind
Table.select
DerefMap.from_targets
DerefMap.dereference
The text was updated successfully, but these errors were encountered: