A General-purpose Task-parallel Programming System using Modern C++
-
Updated
Jan 12, 2025 - C++
A General-purpose Task-parallel Programming System using Modern C++
Sample codes for my CUDA programming book
CUDA Core Compute Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
TinyChatEngine: On-Device LLM Inference Library
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Safe rust wrapper around CUDA toolkit
A simple GPU hash table implemented in CUDA using lock free techniques
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
A self-learning tutorail for CUDA High Performance Programing.
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
From zero to hero CUDA for accelerating maths and machine learning on GPU.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
Accelerated General (FP32) Matrix Multiplication
Speed up image preprocess with cuda when handle image or tensorrt inference
Install CUDA on Windows11 using WSL2
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."