-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hugging Face Datasets #143
Conversation
Wraps a HF dataset into our Coco like dataset API.
* Fix converting CMYK image formats to PNG. * Remove use of get_image_fpath in app.
fbca179
to
e18d769
Compare
for split_name in info.splits: | ||
streaming_str = "streaming" if streaming else "download" | ||
expanded_identifiers.append( | ||
f"{identifier}@{config_name}@{split_name}@{streaming_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bananas 🍌 Is there a better way\ to bring some "config" along with the hugging face dataset?
Defaulting to streaming download for HF datasets, but now you can specify `--download` to download the dataset to disk before loading it.
f5da6b7
to
684e2ee
Compare
@Erotemic Have you ever tried shoehorning a Hugging Face dataset into a COCO shape? Or any other image dataset format for that matter? |
@PaulHax other data formats: yes. It is usually possible because the different image / video annotation formats are almost all doing the same thing in different ways. Some of them make more assumptions than others. I think I've done a reasonable job at giving kwcoco the representation power to handle almost everything. Here are examples of code in kwcoco to handle and help convert to/from other formats: https://gitlab.kitware.com/computer-vision/kwcoco/-/tree/main/kwcoco/formats?ref_type=heads Namely, I have code for:
On top of this, I've done fairly specific conversions for VIAME, which code is scattered around for: https://github.com/VIAME/bioharn/tree/main/dev/data_tools There have also been different non-public formats that can be converted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this branch out and it works well!
I don't see anything wrong in the code, but I also don't have experience with kwcoco and hf
Test it out:
nrtk-explorer --dataset cppe-5 beans rafaelpadilla/coco2017 keremberke/german-traffic-sign-detection mrtoy/mobile-ui-design keremberke/construction-safety-object-detection keremberke/table-extraction
Use the
--download
CLI arg to cache the Hugging Face dataset localy. Dataset streaming is the default.nrtk-explorer --dataset cppe-5 --download
Adds a
dataset[vision]
dependency.Object Detection task tagged datasets:
https://huggingface.co/datasets?task_categories=task_categories:object-detection
Does not support them all =/