-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix FP64 operations on conv_diff
#199
base: master
Are you sure you want to change the base?
Conversation
Benchmarks do not show any speedup currently. I need to try with the original version of merged
|
It seems that the loop merging actually helps a bit with performance, see below (note
|
I don't know enough about GPUs to tell you what to expect. Might need to call in an expert... |
I have verified with Nsight Compute that with the |
Sadly, there is no magic here. |
In FP32 (
T=Float32
) GPU simulations, FP64 operations were detected in theconv_diff!
routine with Nsight Compute (should also happen for CPU). This fix closes #197. I have tracked this down to the flux functionWaterLily.jl/src/Flow.jl
Line 3 in 0c05f4d
where the
0.5
always promotes the operation to FP64. The fix is to use/2
instead, which preserves the floating point operation.I have also done some additional type cleaning, and separated the
@loop
inconv_diff!
into their own kernels. Benchmarks tests need to be conducted to see if the fix impacts performance or not.