Columns in the dataset obtained though load_dataset do not correspond to the one in the dataset viewer since 3.4.0 #7495

bruno-hays · 2025-04-02T17:01:11Z

Describe the bug

I have noticed that on my dataset named BrunoHays/Accueil_UBS, since the version 3.4.0, every column except audio is missing when I load the dataset.

Interestingly, the dataset viewer still shows the correct columns

Steps to reproduce the bug

from datasets import load_dataset
ds = load_dataset("BrunoHays/Accueil_UBS", streaming=True)
print(next(iter(ds["test"])).keys())

With datasets >= 3.4.0:
-> dict_keys(['audio'])
With datasets == 3.3.2:
-> dict_keys(['audio', 'id', 'speaker', 'sentence', 'raw_sentence', 'start_timestamp', 'end_timestamp', 'overlap'])

Expected behavior

All the columns should be present

Environment info

datasets version: 3.3.2
Platform: macOS-14.6.1-x86_64-i386-64bit
Python version: 3.10.15
huggingface_hub version: 0.30.1
PyArrow version: 16.1.0
Pandas version: 1.5.3
fsspec version: 2023.10.0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Columns in the dataset obtained though load_dataset do not correspond to the one in the dataset viewer since 3.4.0 #7495

Columns in the dataset obtained though load_dataset do not correspond to the one in the dataset viewer since 3.4.0 #7495

bruno-hays commented Apr 2, 2025 •

edited

Loading

Columns in the dataset obtained though load_dataset do not correspond to the one in the dataset viewer since 3.4.0 #7495

Columns in the dataset obtained though load_dataset do not correspond to the one in the dataset viewer since 3.4.0 #7495

Comments

bruno-hays commented Apr 2, 2025 • edited Loading

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

bruno-hays commented Apr 2, 2025 •

edited

Loading