Returns (x
+ y
) >> 1, or (x
+ y
+ 1) >> 1
gentype hadd(gentype x,
gentype y)
gentype rhadd(gentype x,
gentype y)
hadd
returns (x
+y
) >> 1.
The intermediate sum does not modulo overflow.
rhadd
returns (x
+y
+1) >> 1.
The intermediate sum does not modulo overflow.
Frequently vector operations need n + 1 bits temporarily to calculate a result.
The rhadd
instruction gives you an extra bit without needing to upsample and downsample.
This can be a profound performance win.