xe: jit: reorder: enable fp4 up-convert #2915

atkassen · 2025-03-18T20:13:13Z

Partially addresses MFDNN-12529.

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

atkassen · 2025-03-18T22:08:02Z

make test
disable test_device_cpu
disable benchdnn_all
enable benchdnn_reorder

src/gpu/intel/jit/codegen/reorder.hpp

atkassen · 2025-03-24T17:20:53Z

make test
disable test_device_cpu
disable benchdnn_all
enable benchdnn_reorder
enable benchdnn_concat
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_pool
enable benchdnn_binary
enable benchdnn_sum
enable benchdnn_shuffle

kealan-barbieri · 2025-03-24T18:53:06Z

src/gpu/intel/jit/codegen/codegen.cpp

        }

        ngen::InstructionModifier mod = type.elems();
        if (!mask_op.is_invalid()) mod |= mask_op.flag_register_mod();
-        auto dst_rbd = buf_op.reg_buf_data().format(
-                off, to_ngen(type.scalar()), type.elems(), stride);
+        auto dst_rbd = buf_op.reg_buf_data().format(off / scalar_type.size(),


is size() rounding to nearest byte? it might make sense to have a separate padded_size to make this more apparent.

For load/store, types are always at least a byte wide.

src/gpu/intel/jit/ir/core.hpp

kealan-barbieri · 2025-03-24T23:20:58Z

src/gpu/intel/jit/ir/send_plan.cpp

@@ -1297,7 +1300,9 @@ class view_info_t {
        // GRF layout will be strided in the middle and may trigger unsupported
        // reorders. Once reorder is robust enough, this check is to be removed
        const int type_size = send_params.mem_type.size();
-        if (type_size < slot_size && slot_size < 4) slot_size = type_size;
+        const int type_packing = send_params.mem_type.packing();
+        if (type_size < slot_size * type_packing && slot_size < 4)


should we be multiplying slot_size by type_packing here? It seems that both type size and slot size are in units of bytes while type_packing would normally use to translate from bytes to nelems in the case of packed 4-bit type.

I actually ran into some issues with conv related to this condition, since we cant do per-value offsets with 4-bit types demoting u16 to u8 loads caused incorrect results (u8d32 loads).

I actually modified this condition to skip in all cases with type_packing > 1 and see no issues in reorder or conv fp4 functional validation on pvc.

atkassen self-assigned this Mar 18, 2025

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 18, 2025

atkassen force-pushed the akassen/ir-4bit branch 2 times, most recently from 344a7af to 0db6c76 Compare March 18, 2025 21:17

atkassen marked this pull request as ready for review March 18, 2025 21:31

atkassen requested a review from a team as a code owner March 18, 2025 21:31

atkassen changed the title ~~[WIP] xe: jit: reorder: enable fp4 up-convert~~ xe: jit: reorder: enable fp4 up-convert Mar 18, 2025

atkassen force-pushed the akassen/ir-4bit branch from 0db6c76 to 2053c61 Compare March 18, 2025 21:54

atkassen force-pushed the akassen/ir-4bit branch from 2053c61 to 0f832a3 Compare March 19, 2025 00:40

atkassen marked this pull request as draft March 19, 2025 15:31

atkassen changed the title ~~xe: jit: reorder: enable fp4 up-convert~~ [WIP] xe: jit: reorder: enable fp4 up-convert Mar 19, 2025

atkassen force-pushed the akassen/ir-4bit branch 3 times, most recently from fbd7d39 to d660868 Compare March 20, 2025 00:10

atkassen marked this pull request as ready for review March 20, 2025 00:10

atkassen force-pushed the akassen/ir-4bit branch from d660868 to 52430f7 Compare March 20, 2025 00:12

atkassen changed the title ~~[WIP] xe: jit: reorder: enable fp4 up-convert~~ xe: jit: reorder: enable fp4 up-convert Mar 20, 2025

kealan-barbieri reviewed Mar 20, 2025

View reviewed changes

src/gpu/intel/jit/codegen/reorder.hpp Outdated Show resolved Hide resolved

atkassen force-pushed the akassen/ir-4bit branch 6 times, most recently from 9ae9dd0 to 5c4ab46 Compare March 24, 2025 15:43

kealan-barbieri reviewed Mar 24, 2025

View reviewed changes

src/gpu/intel/jit/ir/core.hpp Show resolved Hide resolved

atkassen force-pushed the akassen/ir-4bit branch 2 times, most recently from ba35ebc to 6d1c466 Compare March 24, 2025 22:23

atkassen added 10 commits March 24, 2025 15:28

xe: ir: add 4-bit types

b2dc49c

xe: jit: codegen: remove unused parameter

6e2355b

xe: jit: codegen: use offset-based interfaces

e83af0b

xe: jit: ir: adjust sizes/offsets for packed types

62b5620

xe: jit: reorder: remove hf8 workarounds

600ade0

xe: jit: reorder: prevent scalar mov in 2d impl

7299a49

xe: jit: reorder: enable f4_e2m1 up-convert

dbb3d44

xe: jit: reorder: enable f4_e3m0 up-convert

33f6b55

xe: jit: codegen: fix dst width for asm dumping

88abe2e

xe: jit: address clang-tidy complaints

316769b

atkassen force-pushed the akassen/ir-4bit branch from 6d1c466 to 316769b Compare March 24, 2025 22:29

kealan-barbieri reviewed Mar 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xe: jit: reorder: enable fp4 up-convert #2915

xe: jit: reorder: enable fp4 up-convert #2915

atkassen commented Mar 18, 2025 •

edited

Loading

atkassen commented Mar 18, 2025

atkassen commented Mar 24, 2025

kealan-barbieri Mar 24, 2025

atkassen Mar 24, 2025

kealan-barbieri Mar 24, 2025

kealan-barbieri Mar 24, 2025

xe: jit: reorder: enable fp4 up-convert #2915

Are you sure you want to change the base?

xe: jit: reorder: enable fp4 up-convert #2915

Conversation

atkassen commented Mar 18, 2025 • edited Loading

atkassen commented Mar 18, 2025

atkassen commented Mar 24, 2025

kealan-barbieri Mar 24, 2025

Choose a reason for hiding this comment

atkassen Mar 24, 2025

Choose a reason for hiding this comment

kealan-barbieri Mar 24, 2025

Choose a reason for hiding this comment

kealan-barbieri Mar 24, 2025

Choose a reason for hiding this comment

atkassen commented Mar 18, 2025 •

edited

Loading