-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test performance of io.cube.test_query module #403
Comments
Are the long test times really an issue or just a slight annoyance? Also I think we normally rather have the issue that kartothek is under- instead of overtested.
Actually that test found a solid number of edge cases in the past, so I'm not a fan of removing it. In general I would probably rather remove other tests instead of the property-based testing. I cannot judge the other tests just from top of my head, see next point.
What effort is "reasonable" here kinda depends on the point of view:
|
Well, I believe this question can only have subjective answers but I consider it an actual problem as soon as it encourages "I'll push to CI instead of running tests myself" behaviours which I believe we've reached here. Timing our tests, the entire test suite w/out the query tests take about 3:30min on my machine while the query tests alone take up another 7:30min. This is an unreasonable split imho.
I get that point but testing is always a trade off to a certain degree. I believe this hypothesis test itself is a nice thing to have but I'm not convinced about the cost/value ratio of this (13-14% of overall runtime).
totally agreed. my recommendation above was not random but driven by my estimation about their value. the tests are unfortunately not documented and it's somehow guessing what the overall idea behind it was. I pinged you hoping that you could provide some insights into the overall tests setup, e.g. which edge cases things like
I believe we should trust dask to do the right thing w.r.t blocksizes. Having one test with a non-trivial blocksize should cover us that we're not doing something with too many assumptions on this.
TBD / slow flag / see above
This tests against multiple indexed columns (why are there multiple ones? what's the value of having
This is a surprisingly expensive test and the conditions we have in there cover all fields, often multiple times with different operators and edges. either these are some magical numbers or it is redundant. Either way we should document it or remove it
I'm not looking for a "task estimate". I'm looking for some judgment on your/our side on what is sensible and what is not. This is what this ticket is about. We might also want to do only some of these things or may want to simply add documentation about what's there. Bottom line is that 70% of the test runtime is large enough that all maintainers should have a reasonable understanding about what we are testing and why, otherwise this is a nightmare to maintain. |
That sounds like a good thing to do for the property-based testing 👍
I think we can do that for the query tests. There is however one that tests the graph and ensures that the linear fusing works correctly. Since we had severe issues with this in the past I would like to keep it least that specific test.
Because the index column belong to different datasets (a seed, a fully covered additional one, a sparse additional one).
Not anymore since we've remove the time-traveling, so we can remove that.
IIRC that tests non-seed datasets partitioned differently than the seed
"the performance is at least somewhat reasonable", should be an ASV benchmark.
Tests that dtypes are handled correctly (e.g. integer widths are not blown up if not required). Could be a simpler test.
Well, it's an easy test because you know that you for sure cover a large set of cases ;) Aka: CI time is cheaper then a prod crash.
I think that's a bit exaggerated, no? |
Thanks for your input. I believe the above comments cover exactly what I was looking for. I will (eventually) open a PR with more docs and/or some simplifications. From there we can iterate.
The nightmare I am talking about is not the runtime but rather the "I don't know what this test is doing, therefore I dare not touch it". That's always a bad place to be in (yes, maybe nightmare is a slight exageration :) ) |
The performance of the
io.cube.test_query
module is concerning and takes up the majority of runtime of the entire test suite. While this is a cornerstone of the cube functionality, the test runtimes are still unreasonably large.Due to heavy test parametrization we're generating about 1.5k tests where runtimes of individual tests range up to several seconds.
Running the tests for only the eager backend takes about 3min where there runtime is distributed very asymmetrically
test_hypotheis ~50% // ~1:30min (up to 15s each; multiplicity 12)
test_partition_by ~20% // 42s (up to 8s each)
test_condition ~10% // 23s (~0.5s each multiplicity driver x cubes x )
I haven't performed the same benchmarks for the dask backends but I would suspect them to perform worse due to more overhead.
I believe we're struggling with a few problems here
too much local parametrization
I believe we are parametrising too much. For instance the
test_partition_by
is parameterized using seven single columns and two multi columns. While I understand that we have a lot of different cases (indexed, not indexed, partitioned, indexed but not in seed, etc.) I believe testing all of these cases is not necessary and we can reduce this. if this thoroughness is required we probably should push it to a lower level (see next)testing on the wrong level
These tests are effectively end-to-end tests and are testing very granular edge cases. I believe most could be expressed on a different level (e.g. asserting the query intention but not the data)
Too many cubes
We are testing over many "different" cubes in this test suite which are named e.g.
fullrange
,sparse
ormassive_partitions
where the value for some of the cases is not entirely clear.For instance there is
sparse
andsparse_opt
where the difference are some integer types in a few fields. Is a full parameterization for this necessary or is it sufficient to read this stuff once to ensure types are cast properly? etc.What I'd like to discuss here is what we can remove such that build times become more comfortable but we still feel confident in the test suite.
My proposal would be:
dask_bag_bs1
test_partition_by
sparse_outer_opt
,massive_partitions
,other_part
,updated
test_conditions
, e.g. such that we have one per field@crepererum I guess you have been involved in writing these tests and I would appreciate your opinion. I know we discussed the idea about rewriting tests on query intention level but this is not super easy and would require a lot of effort, therefore, I'm looking for a fix with reasonable effort
I can also prepare a PR if that helps with the discussion but the difficult bit is obviously not deleting the code but deciding on what is necessary / feasible to keep
The text was updated successfully, but these errors were encountered: