You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
================================================================================ FAILURES =================================================================================
___________________________________________________________ test_update_shuffle_no_partition_on[None-550-2000] ____________________________________________________________
store_factory = functools.partial(<function get_store_from_url at 0x7f884861eb80>, 'hfs:///private/var/folders/78/9wnl2y0s66dcy42qb8_20nwm0000gr/T/pytest-of-clbb/pytest-81/test_update_shuffle_no_partiti549/store')
bucket_by = None
@pytest.mark.repeat(2000)
@pytest.mark.parametrize("bucket_by", [None, "range"])
def test_update_shuffle_no_partition_on(store_factory, bucket_by):
df = pd.DataFrame(
{
"range": np.arange(10),
"range_duplicated": np.repeat(np.arange(2), 5),
"random": np.random.randint(0, 100, 10),
}
)
ddf = dd.from_pandas(df, npartitions=10)
with pytest.raises(
ValueError, match="``num_buckets`` must not be None when shuffling data."
):
update_dataset_from_ddf(
ddf,
store_factory,
dataset_uuid="output_dataset_uuid",
table="table",
shuffle=True,
num_buckets=None,
bucket_by=bucket_by,
).compute()
res_default = update_dataset_from_ddf(
ddf,
store_factory,
dataset_uuid="output_dataset_uuid_default",
table="table",
shuffle=True,
bucket_by=bucket_by,
).compute()
assert len(res_default.partitions) == 1
res = update_dataset_from_ddf(
ddf,
store_factory,
dataset_uuid="output_dataset_uuid",
table="table",
shuffle=True,
num_buckets=2,
bucket_by=bucket_by,
).compute()
> assert len(res.partitions) == 2
E assert 1 == 2
E +1
E -2
bucket_by = None
ddf = Dask DataFrame Structure:
range range_duplicated random __KTK_HASH_BUCKET
npartitions=9 ... ... ...
9 ... ... ... ...
Dask Name: from_pandas, 9 tasks
df = range range_duplicated random
0 0 0 58
1 1 0 32
2 2 ... 1 99
7 7 1 18
8 8 1 78
9 9 1 69
res = DatasetMetadata(uuid=output_dataset_uuid, tables=['table'], partition_keys=[], metadata_version=4, indices=[], explicit_partitions=True)
res_default = DatasetMetadata(uuid=output_dataset_uuid_default, tables=['table'], partition_keys=[], metadata_version=4, indices=[], explicit_partitions=True)
store_factory = functools.partial(<function get_store_from_url at 0x7f884861eb80>, 'hfs:///private/var/folders/78/9wnl2y0s66dcy42qb8_20nwm0000gr/T/pytest-of-clbb/pytest-81/test_update_shuffle_no_partiti549/store')
io/dask/dataframe/test_shuffle.py:91: AssertionError
============================================================================ warnings summary =============================================================================
tests/io/dask/dataframe/test_shuffle.py::test_update_shuffle_no_partition_on[None-550-2000]
tests/io/dask/dataframe/test_shuffle.py::test_update_shuffle_no_partition_on[None-550-2000]
/Users/clbb/dev/kartothek/kartothek/core/dataset.py:107: DeprecationWarning: The attribute `DatasetMetadataBase.table_meta` will be removed in kartothek 4.0 in favour of `DatasetMetadataBase.schema`.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================= short test summary info =========================================================================
FAILED io/dask/dataframe/test_shuffle.py::test_update_shuffle_no_partition_on[None-550-2000] - assert 1 == 2
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================= 1 failed, 549 passed, 7729 deselected, 2 warnings in 343.11s (0:05:43) ==================================================
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:52668
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:52667
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:52668', name: tcp://127.0.0.1:52668, status: running, memory: 0, processing: 0>
distributed.core - INFO - Removing comms to tcp://127.0.0.1:52668
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:52667', name: tcp://127.0.0.1:52667, status: closing, memory: 0, processing: 0>
distributed.core - INFO - Removing comms to tcp://127.0.0.1:52667
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Scheduler closing..
This was reproduced with pytest-repeat and adding an @pytest.mark.repeat(2000) on top of the test.
Executed command: pytest -v -k "test_update_shuffle_no_partition_on[None" -x --showlocals --repeat-scope session
Problem description
The test
io/dask/dataframe/test_shuffle.py::test_update_shuffle_no_partition_on[None]
is flaky and fail in a small fraction of runs.This was first obsererved in https://github.com/JDASoftwareGroup/kartothek/runs/4334825494?check_suite_focus=true but is also reproducible on
master
:This was reproduced with
pytest-repeat
and adding an@pytest.mark.repeat(2000)
on top of the test.Executed command:
pytest -v -k "test_update_shuffle_no_partition_on[None" -x --showlocals --repeat-scope session
Used versions
The text was updated successfully, but these errors were encountered: