New Features:
-
Parquet Format Support:
- Added support for converting ARFF files to Parquet format. This is an efficient columnar storage format that is widely used for big data processing and analytics.
- You can now convert your ARFF files to Parquet with the following command:
arff-format-converter -f data.arff -o output -fmt parquet
-
Fast Mode:
- Introduced a --fast mode (
-f
flag) for skipping validation checks during the conversion process. This mode is useful when you are confident in the correctness of the input and output paths, and you need a faster conversion. - To enable fast mode, use:
arff-format-converter -f data.arff -o output -fmt json --fast
- Introduced a --fast mode (
Improvements:
-
Performance Optimization:
- The codebase has been optimized to improve the speed of the conversion process, especially when dealing with large datasets. The Parquet and ORC format conversions benefit from these enhancements.
-
Error Handling:
- Better error messages and handling for file reading, writing, and format conversion issues to ensure smoother user experience.
Fixed:
- Bug Fixes:
- Fixed issues with certain edge cases during conversion between ARFF and CSV formats, ensuring compatibility with various ARFF files.
Documentation Updates:
- Updated the README file with examples for the new Parquet format and fast mode feature.
- Improved CLI usage instructions for easier understanding.
Installation:
To upgrade to the latest version, run:
pip install --upgrade arff-format-converter