-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xe: jit: reorder: enable fp4 up-convert #2915
base: main
Are you sure you want to change the base?
Conversation
344a7af
to
0db6c76
Compare
0db6c76
to
2053c61
Compare
make test |
2053c61
to
0f832a3
Compare
fbd7d39
to
d660868
Compare
d660868
to
52430f7
Compare
9ae9dd0
to
5c4ab46
Compare
make test |
} | ||
|
||
ngen::InstructionModifier mod = type.elems(); | ||
if (!mask_op.is_invalid()) mod |= mask_op.flag_register_mod(); | ||
auto dst_rbd = buf_op.reg_buf_data().format( | ||
off, to_ngen(type.scalar()), type.elems(), stride); | ||
auto dst_rbd = buf_op.reg_buf_data().format(off / scalar_type.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is size()
rounding to nearest byte? it might make sense to have a separate padded_size
to make this more apparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For load/store, types are always at least a byte wide.
ba35ebc
to
6d1c466
Compare
6d1c466
to
316769b
Compare
@@ -1297,7 +1300,9 @@ class view_info_t { | |||
// GRF layout will be strided in the middle and may trigger unsupported | |||
// reorders. Once reorder is robust enough, this check is to be removed | |||
const int type_size = send_params.mem_type.size(); | |||
if (type_size < slot_size && slot_size < 4) slot_size = type_size; | |||
const int type_packing = send_params.mem_type.packing(); | |||
if (type_size < slot_size * type_packing && slot_size < 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be multiplying slot_size
by type_packing
here? It seems that both type size
and slot size
are in units of bytes while type_packing
would normally use to translate from bytes to nelems in the case of packed 4-bit type.
I actually ran into some issues with conv related to this condition, since we cant do per-value offsets with 4-bit types demoting u16 to u8 loads caused incorrect results (u8d32 loads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually modified this condition to skip in all cases with type_packing
> 1 and see no issues in reorder or conv fp4 functional validation on pvc.
Partially addresses MFDNN-12529.
make test
andmake test_benchdnn_*
) pass locally for each commit?