Releases: NannyML/nannyml
Releases · NannyML/nannyml
v0.13.0
Fixed
- Fixed incorrect default thresholds in the docstrings for the univariate drift calculator. Thanks for the eagle-eyed reading @josecaloca! (#425)
Changed
- Thorough revamp of our dependency version specifications. Dependencies are now less strict, making it easier to use NannyML as a dependency. Big credits to @davisthomas-aily and @canoadri for their contributions, thoughts and patience on this one. Much appreciated! (#433)
- Added support for Python 3.12
- Dropped support for Python 3.8
v0.12.1
v0.12.0
Fixed
- Fixed broken links in usage logging docs. Cheers once more to @NeoKish! (#417)
- Fixed issues with runner type validation due to changes in Pydantic 2 behavior. (#421)
- Fixed a typo in one the plotting blueprint modules. Eagle eyes @nikml! (#418)
Added
- Added multiclass support for estimated and realized performance metrics
average_precision
andbusiness_value
. (#409) - Added threshold value limits for multiclass metrics. (#411)
Changed
- Made the dependencies required for database access optional. Big thanks to @Duncan-Hunter
- Improved denominator checks in CBPE base estimation functions. (#416)
- Relaxed constraints for the
rich
dependency. (#422)
Removed
- Dropped support for Python 3.7 as it was causing major issues with dependencies. (#410)
v0.11.0
Changed
- Updated
Pydantic
to^2.7.4
,SQLModel
to^0.0.19
. (#401) - Removed the
drop_duplicates
step from theDomainClassifier
for a further speedup. (#402) - Reverted to previous working dependency configuration for
matplotlib
as the current one causes issues inconda
. (#403)
Fixed
- Added
DomainClassifier
method for drift detection to be run in the CLI. - Fixed
NaN
handling for multiclass confusion matrix estimation in CBPE. (#400) - Fixed incorrect handling of columns marked as categorical in Wasserstein and Hellinger drift detection methods.
Thetreat_as_categorical
value was ignored. We've also added atreat_as_continuous
column to explicitly mark columns as continuous.
(#404) - Fixed an issue with multiclass
AUROC
calculation and estimation when not all classes are available in a
reference chunk during fitting. (#405)
Added
- Added a new data quality calculator to check if continuous values in analysis data are within the ranges
encountered in the reference data. Big thanks to @jnesfield! Still needs some documentation...
(#408)
v0.10.7
Changed
- Optimized summary stats and overall performance by avoiding unnecessary copy operations and index resets in during chunking
(#390) - Optimized performance of
nannyml.base.PerMetricPerColumnResult
filter operations by adding a short-circuit path
when only filtering on period. (#391) - Optimized performance of all data quality calculators by avoiding unnecessary evaluations and avoiding copy and index reset operations
(#392)
Fixed
v0.10.6
Changed
- Make predictions optional for performance calcuation. When not provided, only AUROC and average precision will be calculated. (#380)
- Small DLE docs updates
- Combed through and optimized the reconstruction error calculation with PCA resulting in a nice speedup. Cheers @nikml! (#385)
- Updated summary stats value limits to be in line with the rest of the library. Changed from
np.nan
toNone
. (#387)
Fixed
- Fixed a breaking issue in the sampling error calculation for the median summary statistic when there is only a single value for a column. (#377)
- Drop
identifier
column from the documentation example for reconstruction error calculation with PCA. (#382) - Fix an issue where default threshold configurations would get changed when upon setting custom thresholds, bad mutables! (#386)
v0.10.5
v0.10.4
Changed
- We've changed the defaults for the
incomplete
parameter in theSizeBasedChunker
andCountBasedChunker
tokeep
from the previousappend
. This means that from now on, by default, you might have an additional
"incomplete" final chunk. Previously these records would have been appended to the last "complete" chunk.
This change was required for some internal developments, and we also felt it made more sense when looking at
continuous monitoring (as the incomplete chunk will be filled up later as more data is appended). (#367) - We've renamed the Classifier for Drift Detection (CDD) to the more appropriate Domain Classifier. (#368)
- Bumped the version of the
pyarrow
dependency to^14.0.0
if you're running on Python 3.8 or up.
Congrats on your first contribution here @amrit110, much appreciated!
Fixed
- Continuous distribution plots will now be scaled per chunk, as opposed to globally. (#369)
v0.10.3
Fixed
- Handle median summary stat calculation failing due to NaN values
- Fix standard deviation summary stat sampling error calculation occasionally returning infinity (#363)
- Fix plotting confidence bands when value gaps occur (#364)
Added
- New multivariate drift detection method using a classifier and density ration estimation.
v0.10.2
Changed
- Removed p-value based thresholds for Chi2 univariate drift detection (#349)
- Change default thresholds for univariate drift methods to standard deviation based thresholds.
- Add summary stats support to the Runner and CLI (#353)
- Add unique identifier columns to included datasets for better joining (#348)
- Remove unused
confidence_deviation
properties in CBPE metrics (#357) - Improved error handling: failing metric calculation for a single chunk will no longer stop an entire calculator.
Added
- Add feature distribution calculators (#352)
Fixed
- Fix join column settings for CLI (#356)
- Fix crashes in
UnseenValuesCalculator