Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xe: jit: reorder: enable fp4 up-convert #2915

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

atkassen
Copy link
Contributor

@atkassen atkassen commented Mar 18, 2025

Partially addresses MFDNN-12529.

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

@atkassen atkassen self-assigned this Mar 18, 2025
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 18, 2025
@atkassen atkassen force-pushed the akassen/ir-4bit branch 2 times, most recently from 344a7af to 0db6c76 Compare March 18, 2025 21:17
@atkassen atkassen marked this pull request as ready for review March 18, 2025 21:31
@atkassen atkassen requested a review from a team as a code owner March 18, 2025 21:31
@atkassen atkassen changed the title [WIP] xe: jit: reorder: enable fp4 up-convert xe: jit: reorder: enable fp4 up-convert Mar 18, 2025
@atkassen
Copy link
Contributor Author

make test
disable test_device_cpu
disable benchdnn_all
enable benchdnn_reorder

@atkassen atkassen marked this pull request as draft March 19, 2025 15:31
@atkassen atkassen changed the title xe: jit: reorder: enable fp4 up-convert [WIP] xe: jit: reorder: enable fp4 up-convert Mar 19, 2025
@atkassen atkassen force-pushed the akassen/ir-4bit branch 3 times, most recently from fbd7d39 to d660868 Compare March 20, 2025 00:10
@atkassen atkassen marked this pull request as ready for review March 20, 2025 00:10
@atkassen atkassen changed the title [WIP] xe: jit: reorder: enable fp4 up-convert xe: jit: reorder: enable fp4 up-convert Mar 20, 2025
@atkassen atkassen force-pushed the akassen/ir-4bit branch 6 times, most recently from 9ae9dd0 to 5c4ab46 Compare March 24, 2025 15:43
@atkassen
Copy link
Contributor Author

make test
disable test_device_cpu
disable benchdnn_all
enable benchdnn_reorder
enable benchdnn_concat
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_pool
enable benchdnn_binary
enable benchdnn_sum
enable benchdnn_shuffle

}

ngen::InstructionModifier mod = type.elems();
if (!mask_op.is_invalid()) mod |= mask_op.flag_register_mod();
auto dst_rbd = buf_op.reg_buf_data().format(
off, to_ngen(type.scalar()), type.elems(), stride);
auto dst_rbd = buf_op.reg_buf_data().format(off / scalar_type.size(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is size() rounding to nearest byte? it might make sense to have a separate padded_size to make this more apparent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For load/store, types are always at least a byte wide.

@atkassen atkassen force-pushed the akassen/ir-4bit branch 2 times, most recently from ba35ebc to 6d1c466 Compare March 24, 2025 22:23
@@ -1297,7 +1300,9 @@ class view_info_t {
// GRF layout will be strided in the middle and may trigger unsupported
// reorders. Once reorder is robust enough, this check is to be removed
const int type_size = send_params.mem_type.size();
if (type_size < slot_size && slot_size < 4) slot_size = type_size;
const int type_packing = send_params.mem_type.packing();
if (type_size < slot_size * type_packing && slot_size < 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be multiplying slot_size by type_packing here? It seems that both type size and slot size are in units of bytes while type_packing would normally use to translate from bytes to nelems in the case of packed 4-bit type.

I actually ran into some issues with conv related to this condition, since we cant do per-value offsets with 4-bit types demoting u16 to u8 loads caused incorrect results (u8d32 loads).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually modified this condition to skip in all cases with type_packing > 1 and see no issues in reorder or conv fp4 functional validation on pvc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants