From 5f39d0357234317a3a337c99c3edc67b51abffda Mon Sep 17 00:00:00 2001
From: Ruyman Reyes Castro <ruyman@codeplay.com>
Date: Tue, 24 Oct 2023 20:01:39 +0100
Subject: [PATCH] Samsung SAIT presentation about SYCL PIM language extensions

---
 language/README.rst | 108 ++++++++++++++++++++++++++++++--------------
 1 file changed, 75 insertions(+), 33 deletions(-)

diff --git a/language/README.rst b/language/README.rst
index ccefa5c..b9aa0c4 100644
--- a/language/README.rst
+++ b/language/README.rst
@@ -71,61 +71,103 @@ Hyesun Hong,
 
 * PIM/PNM technology enables computation directly on memory
 * Prevents data movement improving performance and reducing consumption
-* PIM operates directly on memory banks by reading and storing on rows and columns
+* Operates directly on memory banks by reading and storing on rows and columns
 * Aquabolt-XL is the first demonstrator
 * Can be drop in on any memory controller
 * CXL-PNM is the CXL variant for PNM, can work with multiple PIM
 
 SYCL Extension for PIM/PNM
-  * Goals
-    * Seamlessly integrate PIM/PNM operation into SYCL
-    * Allow combination of xGPU and PIM/PNM in one device kernel
-    * Not specific to one hardware
-  * Design
-    * Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
-  * Model as special function unit
-    * Aligns with trends to model special functional units inside accelerators
-      * Compiler automatic mapping often not possible
-      * joint_matrix
-  * Group functions
-    * Easy to use
-    * Can easily be combined with device code
-    * Give necessary convergence guarantees
-  * Recap of SYCL work-item, work-group and group functions
-    * Group functions must be encountered in converged control flow
+* Work in collaboration with Codeplay Software team
+* Goals
+
+  * Seamlessly integrate PIM/PNM operation into SYCL
+  * Allow combination of xGPU and PIM/PNM in one device kernel
+  * Not specific to one hardware
+
+* Design
+
+  * Vector operation seem like natural fit
+  * no convergence guarantee and vector size explicit
+
+* Model as special function unit
+
+  * Aligns with trends to model special functional units inside accelerators
+  * Compiler automatic mapping often not possible
+  * joint_matrix-like interface
+
+
+* Group functions
+
+  * Easy to use
+  * Can easily be combined with device code
+  * Give necessary convergence guarantees
+
+
+* Recap of SYCL work-item, work-group and group functions
+
+  * Group functions must be encountered in converged control flow
+
 * Extension
-    * Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
-    * Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
+
+  * Extended group functions with additional overload of joint_reduce
+  * and new joint_transform and joint_inner_product
+  * Block size as template parameter, number of blocks as runtime parameter
+  * allows calculation of number of elements to process
+
 * Extension for PNM
-    * Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
-* PNM standalone has less opportunity for parallelism, also limited by memory controller
-    * -> Combine PNM and PIM, PNM generates commands for PIM blocks
+
+  * Added new overloads of joint_exclusive_scan,
+  * joint_inclusive_scan, reduce_over_group
+
+* PNM standalone has less opportunity for parallelism
+
+  * limited by memory controller
+  * -> Combine PNM and PIM, PNM generates commands for PIM blocks
+
 * Two modes
+
   * PIM mode: PIM blocks can operate independently, can choose number of blocks
   * PNM mode: Synchronized execution on multiple PIM blocks
+
 * Mapping
+
   * Every PIM block is one work-item
   * PNM with attached PIM blocks forms one work-group
+
 * Execution
- * Work-item operations map to PIM operation
- * Group functions map to PNM operation
+
+  * Work-item operations map to PIM operation
+  * Group functions map to PNM operation
+
 * Example
+
   * work-item execution maps to PIM
   * group function maps to PNM
+
 * Conclusion
+
   * Integrate support for PIM/PNM into SYCL
 
 Q&A
-* Are the proposed functions specific to PIM or could also be used with other HW?
-    * Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
-    * Can also map nicely to other types of hardware, for example vector processor
+* Are the proposed functions specific to PIM, could also be used with other HW?
+
+  * Can also be used with other hardware.
+  * Semantics not PIM-specific, but translation of C++ to SYCL
+  * Can also map nicely to other types of hardware, e.g. vector processor
+
 * Why have the user explicitly specify a block-size?
-    * Not a hardware detail
-    * Rather a promise by the user that data-blocks will always be at least that big
-    * Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
-* Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
-    * Yes, that is possible, mainly a design question
-    * Current version might have additional implications regarding alignment
+
+  * Not a hardware detail
+  * Rather a promise by the user that data-blocks
+    will always be at least that big
+  * Promise allows device compiler to perform optimizations,
+    efficient looping inside PIM unit
+
+* Could num_blocks runtime parameter be replaced by iterator?
+
+  * requires to be divisable by block-size
+  * Yes, that is possible, mainly a design question
+  * Current version might have additional implications regarding alignment
 
 
 2023-06-05