Skip to content

Releases: psmyth94/biosets

1.2.1

18 Nov 21:50
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.2.0...1.2.1

1.2.0

10 Nov 18:56
Compare
Choose a tag to compare

What's Changed

  • Support loading from hugging face hub by @psmyth94 in #5

Full Changelog: 1.1.0...1.2.0

1.0.1

20 Oct 13:14
4a1aa3e
Compare
Choose a tag to compare

Metadata Processing Changes

  • Introduced _load_metadata method to handle loading of sample metadata files.
  • Updated _generate_tables method to handle feature and sample metadata more robustly, including setting column names based on feature metadata when appropriate.
  • Added the ability to load sample metadata in chunks. Has to be the same number of shards to its corresponding dataset.

Configuration Changes

  • Added new optional configuration fields data_columns and columns to BioDataConfig class.

What's Changed

Full Changelog: v1.0.0...v1.0.1

1.0.0

20 Oct 13:01
Compare
Choose a tag to compare

Release Date: 2024-10-15

Overview

This is the initial release of the biosets package, which extends the Hugging Face datasets library to handle bioinformatics datasets.

Features

  • Added the biosets class, designed to extend the datasets library with features for bioinformatics, including support for samples, batches, features, and metadata integration.
  • Added support for custom data handling classes: ValueWithMetadata, Sample, Batch, Metadata, RegressionTarget, and BinClassLabel.
  • Implemented automatic inference of column names, such as sample IDs, batch IDs, labels, and feature IDs.
  • Enabled integration of sample metadata and feature metadata from CSV, TSV, Parquet, Arrow, and JSON files.
  • Supported multiple file formats, including CSV, TSV, Parquet, Arrow, and sparse data saved as NPZ.