Release v0.22.0 · ml-explore/mlx

Highlights

Export and import MLX functions to a file (example, bigger example)
- Functions can be exported from Python and run in C++ and vice versa

Add slice and slice_update which take arrays for starting locations
Add an example for using MLX in C++ with CMake
Fused attention for generation now supports boolean masking (benchmark)
Allow array offset for mx.fast.rope
Add mx.finfo
Allow negative strides without resorting to copying for slice and as_strided
Add Flatten, Unflatten and ExpandDims primitives
Enable the compilation of lambdas in C++
Add a lot more primitives for shapeless compilation (full list)
Fix performance regression in qvm
Introduce separate types for Shape and Strides and switch to int64 strides from uint64
Reduced copies for fused-attention kernel
Recompile a function when the stream changes
Several steps to improve the linux / x86_64 experience (#1625, #1627, #1635)
Several steps to improve/enable the windows experience (#1628, #1660, #1662, #1661, #1672, #1663, #1664, ...)
Update to newer Metal-cpp
Throw when exceeding the maximum number of buffers possible
Add mx.kron
mx.distributed.send now implements the identity function instead of returning an empty array
Better errors reporting for mx.compile on CPU and for unrecoverable errors

Fix qmv/qvm bug for batch size 2-5
Fix some leaks and races (#1629)
Fix transformer postnorm in mlx.nn
Fix some mx.fast fallbacks
Fix the hashing for string constants in compile
Fix small sort in Metal
Fix memory leak of non-evaled arrays with siblings
Fix concatenate/slice_update vjp in edge-case where the inputs have different type