Update README.md

svtcli · web-flow · commit 05725e5b62db · 2024-05-24T16:37:23.000+02:00
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ The SYCL open standard rewards such porting efforts with highly scalable results
 SYCL unlocks the capabilities of GPGPUs, accelerators and multicore or vector CPUs, as well as advanced compiler features and technologies (LLVM, JIT), while offering intuitive C++ APIs for work-sharing and scheduling, and for directly mapping simulation domains into execution space.
 The latter is especially convenient in numerical General Relativity (GR), a highly compute- and memory- intensive field where the properties of space and time are strictly coupled with the equations of motion.
 
-We present DPEcho, a SYCL+MPI porting of the General-Relativity-Magneto-Hydrodynamic (GR-MHD) OpenMP+MPI code Echo, used to model instabilities, turbulence, propagation of waves, stellar winds and magnetospheres, and astrophysical processes around Black Holes.
+DPEcho, a SYCL+MPI porting of the General-Relativity-Magneto-Hydrodynamic (GR-MHD) OpenMP+MPI code Echo, is used to model instabilities, turbulence, propagation of waves, stellar winds and magnetospheres, and astrophysical processes around Black Holes.
 It supports classic and relativistic MHD, both in Minkowski- or any coded GR metric.
 DPEcho uses exclusively SYCL structures for memory and data management, and the flow control revolves entirely around critical device-code blocks, for which the key physics kernels were re-designed: most data reside almost permanently on the device, maximizing computational times.
 As a result, on the core physics elements ported so far, the measured performance gain is above 4x on HPC CPU hardware, and of order 7x on commercial GPUs.
@@ -16,56 +16,67 @@ As a result, on the core physics elements ported so far, the measured performanc
 ## Prerequisites
 
 It is possible to compile echo with SYCL 2020-compatible compilers.
-However, our main target so far was the **Intel oneAPI DPC++ Compiler**.
-Nevertheless, with minor tweaks to the the CMake file, it is possible to use other implementations that support SYCL 2020.
-So far, we successfully used:
+Some compilers may require minor tweaks to the the CMake file. Among the most popular ones we name:
 
-* **[Intel oneAPI toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html)** (version >= 2023.0) targeting Intel CPUs and Intel GPUs. Using the **[Codeplay Plugins](https://codeplay.com/solutions/oneapi/)**, NVIDIA and AMD GPUs can also be targeted.
+* **[Intel oneAPI toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html)** (version >= 2024.0) targeting Intel CPUs and Intel GPUs. Using the **[Codeplay Plugins](https://codeplay.com/solutions/oneapi/)**, NVIDIA and AMD GPUs can also be targeted.
 * **[Intel LLVM compiler](https://github.com/intel/llvm)** open Source project targeting Intel CPUs and NVIDIA and AMD GPUs.
-* **[OpenSYCL](https://github.com/OpenSYCL/OpenSYCL)* - a SYCL implementation for CPUs and GPUs (functionality untested in DPEcho)*
+* **[AdaptiveCpp](https://github.com/AdaptiveCpp/AdaptiveCpp)* - a SYCL implementation for CPUs and GPUs (as soon as SYCL2020 reduction kernels become available)*
 
 Depending on the chosen compiler, DPEcho is capable of running on a wide variety of CPUs and GPUs.
 It is possible to use compute devices only capable of working with single-precision floating point numbers, but for a sufficient accuracy with more complex scenarios, double precision support is likely necessary.
 
 Further requirements:
-* CMake (>= 3.13)
+* CMake (>= 3.22)
 * VisIt for visualization of the output.
+* Boost for energy meter (see below)
 
 ## Building
 
 make sure that your SYCL compiler and CMake are in available in your environment.
 
-Using cmake >= 3.13. E.g.
+Using cmake >= 3.22. E.g.
 
 ``` bash
 mkdir -p build && cd build
-cmake .. && make
+CXX=<chosenCompilerName> cmake .. && make
 ```
 Or ccmake:
 ``` bash
-mkdir -p build && cd build && ccmake ..
+mkdir -p build && cd build
+CXX=<chosenCompilerName> ccmake ..
+[...]
 make
 ```
-
-Simulation parameters such as order of derivation, type of simulation (MHD or GR-MHD) or type of execution device may be edited in the CCMake command line UI.
-Other parameters may be set at runtime.
+Simulation parameters such as order of derivation, type of simulation (MHD or GR-MHD) or type of execution device may be edited in the CCMake command line UI. Other parameters may be set at runtime (check your runtime documentation).
 An example parameter file is shown at [the example parameter file alfven.par](examples/alfven.par).
 As a default behavior, DPEcho expects a parameter file called **dpecho.par** in its working directory.
 The path to an alternative file may also be passed as a commandline argument.
 
-## Known Issues
+## Device selection
+- At compile time the user selects the preferred device selector (CPU, GPU, ...). 
+- By default DPEcho leaves exact device selection to SYCL runtime.
+- This behaviour can be overridden from `dpecho.par` thrugh the following parameters:
+  - `deviceSelection` (put anything other than default) 
+  - `deviceOffset` (select first device to use; check logs after dummy run for exact order)
+  - `deviceCount` (how many devices DPEcho will distribute among its MPI ranks; dummy parameter for non-MPI binaries)
+
+## Energy Meter
+DPEcho comes with an experimental energy meter, in `tb-timer.hpp`, using an extra process through the boost library.
+In order to activate it:
+- Compile DPEcho with ENERGY_METER on
+- Copy the scritp `tools/deltaEnergy.sh` in the run folder
+- Edit the script to use your own power meter. Provided examples include nvidia-smi, rocm-smi and xpu-smi for GPUs, and likwid, perf or EAR for CPUs.
 
-* NVIDIA GPUs report out of resources with automatic kernel invocations. Defaulting to manual workgroup size specification.
+## Known Issues
 
-* AMD GPU support through OpenSYCL requires GPU-aware-MPI and must be configured for a specific architecture. E.g.:
-```bash
-CXX=$ROCM_PATH/llvm/bin/clang cmake -DSYCL=OpenSYCL -DSYCL_DEVICE=GPU -DSYCL_ARCH=AMD -DCMAKE_CXX_FLAGS="--offload-arch=gfx90a" ..
-```
+* Some provided MPI methods may misbehave on some GPUs. `MPI_SR_REPLACE` is the most reliable, `MPI_SENDRECV` the most performant.
 
 ## References
 
 * [Intel Parallel Universe Magazine](https://www.intel.com/content/www/us/en/developer/articles/technical/dpecho-general-relativity-sycl-for-2020-beyond.html#gs.pqrf25), Salvatore Cielo, Alexander Pöppl, Luca Del Zanna, Matteo Bugli - *DPEcho: General Relativity with SYCL for the 2020s and beyond*
 
+* [SYCLcon2023 Proceedings](https://dl.acm.org/doi/proceedings/10.1145/3585341.3585382), Salvatore Cielo, Alexander Pöppl, Margarita Egelhofer, *Portability and Scaling of the DPEcho GR-MHD SYCL code: What’s new for numerical Astrophysics in SYCL2020*
+
 ## Authors
 (in alphabetical order)
 * Fabio Baruffa (former)