Skip to content

Commit 05725e5

Browse files
authored
Update README.md
1 parent f72a70e commit 05725e5

File tree

1 file changed

+30
-19
lines changed

1 file changed

+30
-19
lines changed

README.md

+30-19
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The SYCL open standard rewards such porting efforts with highly scalable results
66
SYCL unlocks the capabilities of GPGPUs, accelerators and multicore or vector CPUs, as well as advanced compiler features and technologies (LLVM, JIT), while offering intuitive C++ APIs for work-sharing and scheduling, and for directly mapping simulation domains into execution space.
77
The latter is especially convenient in numerical General Relativity (GR), a highly compute- and memory- intensive field where the properties of space and time are strictly coupled with the equations of motion.
88

9-
We present DPEcho, a SYCL+MPI porting of the General-Relativity-Magneto-Hydrodynamic (GR-MHD) OpenMP+MPI code Echo, used to model instabilities, turbulence, propagation of waves, stellar winds and magnetospheres, and astrophysical processes around Black Holes.
9+
DPEcho, a SYCL+MPI porting of the General-Relativity-Magneto-Hydrodynamic (GR-MHD) OpenMP+MPI code Echo, is used to model instabilities, turbulence, propagation of waves, stellar winds and magnetospheres, and astrophysical processes around Black Holes.
1010
It supports classic and relativistic MHD, both in Minkowski- or any coded GR metric.
1111
DPEcho uses exclusively SYCL structures for memory and data management, and the flow control revolves entirely around critical device-code blocks, for which the key physics kernels were re-designed: most data reside almost permanently on the device, maximizing computational times.
1212
As a result, on the core physics elements ported so far, the measured performance gain is above 4x on HPC CPU hardware, and of order 7x on commercial GPUs.
@@ -16,56 +16,67 @@ As a result, on the core physics elements ported so far, the measured performanc
1616
## Prerequisites
1717

1818
It is possible to compile echo with SYCL 2020-compatible compilers.
19-
However, our main target so far was the **Intel oneAPI DPC++ Compiler**.
20-
Nevertheless, with minor tweaks to the the CMake file, it is possible to use other implementations that support SYCL 2020.
21-
So far, we successfully used:
19+
Some compilers may require minor tweaks to the the CMake file. Among the most popular ones we name:
2220

23-
* **[Intel oneAPI toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html)** (version >= 2023.0) targeting Intel CPUs and Intel GPUs. Using the **[Codeplay Plugins](https://codeplay.com/solutions/oneapi/)**, NVIDIA and AMD GPUs can also be targeted.
21+
* **[Intel oneAPI toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html)** (version >= 2024.0) targeting Intel CPUs and Intel GPUs. Using the **[Codeplay Plugins](https://codeplay.com/solutions/oneapi/)**, NVIDIA and AMD GPUs can also be targeted.
2422
* **[Intel LLVM compiler](https://github.com/intel/llvm)** open Source project targeting Intel CPUs and NVIDIA and AMD GPUs.
25-
* **[OpenSYCL](https://github.com/OpenSYCL/OpenSYCL)* - a SYCL implementation for CPUs and GPUs (functionality untested in DPEcho)*
23+
* **[AdaptiveCpp](https://github.com/AdaptiveCpp/AdaptiveCpp)* - a SYCL implementation for CPUs and GPUs (as soon as SYCL2020 reduction kernels become available)*
2624

2725
Depending on the chosen compiler, DPEcho is capable of running on a wide variety of CPUs and GPUs.
2826
It is possible to use compute devices only capable of working with single-precision floating point numbers, but for a sufficient accuracy with more complex scenarios, double precision support is likely necessary.
2927

3028
Further requirements:
31-
* CMake (>= 3.13)
29+
* CMake (>= 3.22)
3230
* VisIt for visualization of the output.
31+
* Boost for energy meter (see below)
3332

3433
## Building
3534

3635
make sure that your SYCL compiler and CMake are in available in your environment.
3736

38-
Using cmake >= 3.13. E.g.
37+
Using cmake >= 3.22. E.g.
3938

4039
``` bash
4140
mkdir -p build && cd build
42-
cmake .. && make
41+
CXX=<chosenCompilerName> cmake .. && make
4342
```
4443
Or ccmake:
4544
``` bash
46-
mkdir -p build && cd build && ccmake ..
45+
mkdir -p build && cd build
46+
CXX=<chosenCompilerName> ccmake ..
47+
[...]
4748
make
4849
```
49-
50-
Simulation parameters such as order of derivation, type of simulation (MHD or GR-MHD) or type of execution device may be edited in the CCMake command line UI.
51-
Other parameters may be set at runtime.
50+
Simulation parameters such as order of derivation, type of simulation (MHD or GR-MHD) or type of execution device may be edited in the CCMake command line UI. Other parameters may be set at runtime (check your runtime documentation).
5251
An example parameter file is shown at [the example parameter file alfven.par](examples/alfven.par).
5352
As a default behavior, DPEcho expects a parameter file called **dpecho.par** in its working directory.
5453
The path to an alternative file may also be passed as a commandline argument.
5554

56-
## Known Issues
55+
## Device selection
56+
- At compile time the user selects the preferred device selector (CPU, GPU, ...).
57+
- By default DPEcho leaves exact device selection to SYCL runtime.
58+
- This behaviour can be overridden from `dpecho.par` thrugh the following parameters:
59+
- `deviceSelection` (put anything other than default)
60+
- `deviceOffset` (select first device to use; check logs after dummy run for exact order)
61+
- `deviceCount` (how many devices DPEcho will distribute among its MPI ranks; dummy parameter for non-MPI binaries)
62+
63+
## Energy Meter
64+
DPEcho comes with an experimental energy meter, in `tb-timer.hpp`, using an extra process through the boost library.
65+
In order to activate it:
66+
- Compile DPEcho with ENERGY_METER on
67+
- Copy the scritp `tools/deltaEnergy.sh` in the run folder
68+
- Edit the script to use your own power meter. Provided examples include nvidia-smi, rocm-smi and xpu-smi for GPUs, and likwid, perf or EAR for CPUs.
5769

58-
* NVIDIA GPUs report out of resources with automatic kernel invocations. Defaulting to manual workgroup size specification.
70+
## Known Issues
5971

60-
* AMD GPU support through OpenSYCL requires GPU-aware-MPI and must be configured for a specific architecture. E.g.:
61-
```bash
62-
CXX=$ROCM_PATH/llvm/bin/clang cmake -DSYCL=OpenSYCL -DSYCL_DEVICE=GPU -DSYCL_ARCH=AMD -DCMAKE_CXX_FLAGS="--offload-arch=gfx90a" ..
63-
```
72+
* Some provided MPI methods may misbehave on some GPUs. `MPI_SR_REPLACE` is the most reliable, `MPI_SENDRECV` the most performant.
6473

6574
## References
6675

6776
* [Intel Parallel Universe Magazine](https://www.intel.com/content/www/us/en/developer/articles/technical/dpecho-general-relativity-sycl-for-2020-beyond.html#gs.pqrf25), Salvatore Cielo, Alexander Pöppl, Luca Del Zanna, Matteo Bugli - *DPEcho: General Relativity with SYCL for the 2020s and beyond*
6877

78+
* [SYCLcon2023 Proceedings](https://dl.acm.org/doi/proceedings/10.1145/3585341.3585382), Salvatore Cielo, Alexander Pöppl, Margarita Egelhofer, *Portability and Scaling of the DPEcho GR-MHD SYCL code: What’s new for numerical Astrophysics in SYCL2020*
79+
6980
## Authors
7081
(in alphabetical order)
7182
* Fabio Baruffa (former)

0 commit comments

Comments
 (0)