C++ performance and optimization #1767

wustus · 2025-01-14T20:47:08Z

I've initially implemented a neural network in C++, following Nielsens Book 'Neural Networks and Deep Learning' and using Eigen as a BLAS library.
After reimplementing the same network for different datasets and reaching one where the input was larger than before (3.072x1), one epoch trained for ~30 seconds, I wanted to look into GPU training. After some research I found MLX and switched out the Eigen Matrices for MLX Vectors.
Doing this for a smaller project (kmnist) with 28x28 grayscale images (same as MNIST), I get extremely slow code compared to Eigen.

My current code can be found here: https://github.com/wustus/kuzushiji

I'm trying to chain my operations and evaluate every few batches.
When trying to refactor for example the sigmoid_prime method and compile it for a speedup I got a Segmentation fault.
I know I'm probably better off using the MLX methods for SGD and I guess Module to create the actual network but I'd like to use it as a BLAS library to learn as much as I can about the underlying techniques.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ performance and optimization #1767

C++ performance and optimization #1767

wustus commented Jan 14, 2025

C++ performance and optimization #1767

C++ performance and optimization #1767

Comments

wustus commented Jan 14, 2025