You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was not able to open a local saved dataset anymore that was created using an older datasets version after the upgrade yesterday from datasets 3.3.2 to 3.4.0
The traceback is
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/arrow/arrow.py", line 67, in _generate_tables
batches = pa.ipc.open_stream(f)
File "/usr/local/lib/python3.10/dist-packages/pyarrow/ipc.py", line 190, in open_stream
return RecordBatchStreamReader(source, options=options,
File "/usr/local/lib/python3.10/dist-packages/pyarrow/ipc.py", line 52, in __init__
self._open(source, options=options, memory_pool=memory_pool)
File "pyarrow/ipc.pxi", line 1006, in pyarrow.lib._RecordBatchStreamReader._open
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected to read 538970747 metadata bytes, but only read 2126
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1855, in _prepare_split_single
for _, table in generator:
File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/arrow/arrow.py", line 69, in _generate_tables
reader = pa.ipc.open_file(f)
File "/usr/local/lib/python3.10/dist-packages/pyarrow/ipc.py", line 234, in open_file
return RecordBatchFileReader(
File "/usr/local/lib/python3.10/dist-packages/pyarrow/ipc.py", line 110, in __init__
self._open(source, footer_offset=footer_offset,
File "pyarrow/ipc.pxi", line 1090, in pyarrow.lib._RecordBatchFileReader._open
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Not an Arrow file
Describe the bug
I was not able to open a local saved dataset anymore that was created using an older datasets version after the upgrade yesterday from datasets 3.3.2 to 3.4.0
The traceback is
Steps to reproduce the bug
Load a dataset from a local folder with
as it is done for example in the training script for SD3 controlnet.
This is the minimal script to test it:
Expected behavior
Work in 3.4.0 like in 3.3.2
Environment info
datasets
version: 3.4.0huggingface_hub
version: 0.29.3fsspec
version: 2024.12.0The text was updated successfully, but these errors were encountered: