Releases: psmyth94/biosets
Releases · psmyth94/biosets
1.2.1
1.2.0
1.0.1
Metadata Processing Changes
- Introduced
_load_metadata
method to handle loading of sample metadata files. - Updated
_generate_tables
method to handle feature and sample metadata more robustly, including setting column names based on feature metadata when appropriate. - Added the ability to load sample metadata in chunks. Has to be the same number of shards to its corresponding dataset.
Configuration Changes
- Added new optional configuration fields
data_columns
andcolumns
toBioDataConfig
class.
What's Changed
Full Changelog: v1.0.0...v1.0.1
1.0.0
Release Date: 2024-10-15
Overview
This is the initial release of the biosets
package, which extends the Hugging Face datasets
library to handle bioinformatics datasets.
Features
- Added the
biosets
class, designed to extend the datasets library with features for bioinformatics, including support for samples, batches, features, and metadata integration. - Added support for custom data handling classes:
ValueWithMetadata
,Sample
,Batch
,Metadata
,RegressionTarget
, andBinClassLabel
. - Implemented automatic inference of column names, such as sample IDs, batch IDs, labels, and feature IDs.
- Enabled integration of sample metadata and feature metadata from CSV, TSV, Parquet, Arrow, and JSON files.
- Supported multiple file formats, including CSV, TSV, Parquet, Arrow, and sparse data saved as NPZ.