Skip to content

Conversation

@zhang-hui-yulo
Copy link
Contributor

@zhang-hui-yulo zhang-hui-yulo commented Dec 13, 2025

Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.

CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.

  • align tile of mfan in mmq.

Resolves #17856

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have MFMA hardware for development I would suggest that you simply don't touch the corresponding code for now.

DATA_LAYOUT_J_MAJOR = 10, // Matrix C for CDNA and RDNA4, int and float matrix C for RDNA3.
DATA_LAYOUT_I_MAJOR_MIRRORED = 20,
DATA_LAYOUT_J_MAJOR_MIRRORED = 30,
DATA_LAYOUT_I_MAJOR_DUAL = 40, // Matrix A&B for RDNA3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you're not using I_MAJOR_MIRRORED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a check in I_MAJOR_MIRRORED, ne = I * J / (WARP_SIZE/4), so it's for volta 8x8 gemm, so I add I_MAJOR_DUAL to handle RDNA3 problems, I don't think that mixing volta and rdna3 codes is a good choice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about what is a good choice, it's about what is the least bad choice. For this PR it's fine to add an extra value to the enum but I will refactor this to instead use either I_MAJOR or I_MAJOR_MIRRORED at some later time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I_MAJOR for RDNA3 matrix A&B is the worst choice, I_MAJOR is only for RDNA3 matrix C not A&B, or you can only judge A&B or C by the shape, this is the current way is doing.

It can be moved to I_MAJOR_MIRRORED if you think mixing Volta and RDNA3 is acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a check of I_MAJOR_MIRRORED, it only has half2 support, sorry I really not able to cover Volta, keeping I_MAJOR_DUAL and merging I_MAJOR_MIRRORED and I_MAJOR_DUAL from your side in the future is a better choice.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 13, 2025
@zhang-hui-yulo
Copy link
Contributor Author

Honestly, as the refactor changes too much code, keeping the old path of MFMA still needs full test on CDNA, so I think it's worth to have a try to make the code correct first.

@JohannesGaessler
Copy link
Collaborator

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

@zhang-hui-yulo
Copy link
Contributor Author

If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you.

I agree, I also don't want to touch MFMA part as I've been spending more than one month to acquire a MI308 but there is still no good response, I'm not sure if I'm able to get one.

Anyway, could you help to run a quick test of MUL_MAT on your CDNA then I can decide how to move forward? Thank you.

But, even not touch MFMA way will still modify the code of MFMA in mmq, it still need your help to do test, thank you.

@JohannesGaessler
Copy link
Collaborator

test-backend-ops is failing on my MI100: log.txt

I'm willing to give you SSH access for development purposes but the machine with the MI100 would only be running during the daytime in Germany since it's in my living space and very loud.

@zhang-hui-yulo
Copy link
Contributor Author

Thank you for the help, inf is not a good signal as it loads wrong data, let me revert CDNA part first then wait for AMD's response for a while to see if I'm able to access a CDNA3.

@zhang-hui-yulo
Copy link
Contributor Author

Hello @JohannesGaessler

I just give up acquiring MI308 from my company internal cloud, I'm acquiring MI300 at https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html, but as I'm in China, I'm not sure if I can access MI300 successfully.

I just fixed a potential wrong ne problem in J_MAJOR for CDNA, could you help to have a quick test on your MI GPU? If it still doesn't work, could you share a ssh connection to me if possible, time difference between China and Germany isn't unacceptable, I just need to work from 4PM to 9PM, hopefully I'm able to connect it, thank you.

Best Regards
Hui

@JohannesGaessler
Copy link
Collaborator

test-backend-ops is passing now for the MI100. Do you want me to review your PR now or are you still going to make more changes first?

In any case, if you still want hardware access for development send me an email with your public key and your desired username.

@zhang-hui-yulo
Copy link
Contributor Author

zhang-hui-yulo commented Dec 16, 2025

test-backend-ops is passing now for the MI100. Do you want me to review your PR now or are you still going to make more changes first?

In any case, if you still want hardware access for development send me an email with your public key and your desired username.

Thank you for the support, I think all my changes have been done, please help to review it first, the only remaining thing shall be merging I_MAJOR_MIRRORED and I_MAJOR_DUAL, but currently I still cannot find a good way to handle volta as volta only has half2 and RDNA3 has half2 and int for mmq.

And thank you for providing the access of your MI100, since it's passed, I think I don't need it anymore, I will be very appreciated if AMD can give me an access of CDNA3.

@zhang-hui-yulo zhang-hui-yulo marked this pull request as ready for review December 16, 2025 12:11
@JohannesGaessler
Copy link
Collaborator

Please check the editorconfig CI job for trailing whitespaces.

@zhang-hui-yulo
Copy link
Contributor Author

Attach the result of mul_mat on RDNA4.

MUL_MAT
Backend GGML op Op parameters TFLOPS master TFLOPS refactor_mma_for_rdna Speedup
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.61 0.61 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.20 1.20 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.66 1.65 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.80 1.81 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.19 2.19 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 93.04 92.86 1.00
ROCm0 MUL_MAT type_a=bf16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.21 3.22 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=128,n=1,k=16416,bs=[8,1],nr=[4,1],per=[0,1,2,3],k_v=32832,o=1 1.38 1.38 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=16416,n=1,k=128,bs=[8,1],nr=[4,1],per=[0,2,1,3],k_v=0,o=1 0.34 0.34 0.99
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.61 0.61 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.21 1.21 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.77 1.77 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.28 2.27 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.69 2.68 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 96.37 96.11 1.00
ROCm0 MUL_MAT type_a=f16,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.24 3.25 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.31 0.31 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.63 0.63 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 0.94 0.94 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.25 1.25 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.55 1.54 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.49 3.48 1.00
ROCm0 MUL_MAT type_a=f32,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.31 2.31 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.72 3.69 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.01 5.97 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.11 7.06 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.93 7.88 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.35 8.30 0.99
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 74.85 74.76 1.00
ROCm0 MUL_MAT type_a=iq1_m,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.41 9.39 1.00
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.20 4.17 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.03 6.95 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.49 7.48 1.00
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.32 8.28 0.99
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.00 8.97 1.00
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 67.85 70.52 1.04
ROCm0 MUL_MAT type_a=iq1_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.97 8.94 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.64 1.61 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.88 2.87 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.00 3.98 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.85 4.82 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.13 5.12 1.00
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 45.69 45.17 0.99
ROCm0 MUL_MAT type_a=iq2_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.57 6.56 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.23 2.21 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.90 3.86 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.11 5.09 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.94 5.91 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.68 6.66 1.00
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 46.94 46.39 0.99
ROCm0 MUL_MAT type_a=iq2_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.86 7.82 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.70 1.67 0.98
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.02 3.00 0.99
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.19 4.17 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.10 5.07 0.99
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.88 5.86 1.00
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 68.93 71.99 1.04
ROCm0 MUL_MAT type_a=iq2_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.96 6.94 1.00
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.59 1.57 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.90 2.89 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.08 4.05 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.07 5.04 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.90 5.87 0.99
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 66.04 69.74 1.06
ROCm0 MUL_MAT type_a=iq3_s,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.10 7.10 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.17 2.15 0.99
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.83 3.82 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.14 5.11 0.99
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.87 5.83 0.99
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.92 6.89 1.00
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 73.90 78.32 1.06
ROCm0 MUL_MAT type_a=iq3_xxs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.76 7.73 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.84 3.81 0.99
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.47 5.47 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.30 7.31 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.74 8.74 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.55 8.57 1.00
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 80.36 85.52 1.06
ROCm0 MUL_MAT type_a=iq4_nl,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.37 9.37 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.82 3.78 0.99
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.61 6.59 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.04 9.00 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.78 9.74 1.00
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.35 10.30 0.99
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 79.29 86.55 1.09
ROCm0 MUL_MAT type_a=iq4_xs,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 10.19 10.17 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.65 3.59 0.98
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.13 5.13 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.22 7.19 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.64 8.62 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.72 8.74 1.00
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 81.35 86.24 1.06
ROCm0 MUL_MAT type_a=mxfp4,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.31 9.25 0.99
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.92 2.89 0.99
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.73 3.73 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.07 4.05 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.26 4.24 0.99
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.37 4.35 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 23.33 23.41 1.00
ROCm0 MUL_MAT type_a=q2_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.42 4.40 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.76 1.73 0.98
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.87 2.85 0.99
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.47 3.45 0.99
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.89 3.87 0.99
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.15 4.14 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 60.09 59.86 1.00
ROCm0 MUL_MAT type_a=q3_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.46 4.44 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.10 4.06 0.99
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.56 5.53 0.99
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.41 7.44 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.76 8.75 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.64 8.66 1.00
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 77.28 80.43 1.04
ROCm0 MUL_MAT type_a=q4_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.55 9.51 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.02 3.99 0.99
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.99 6.93 0.99
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.67 7.65 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.05 9.04 1.00
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.76 8.87 1.01
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 62.02 64.33 1.04
ROCm0 MUL_MAT type_a=q4_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.88 9.85 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.78 2.73 0.98
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.74 3.72 0.99
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.08 4.07 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.30 4.27 0.99
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.41 4.40 1.00
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 65.51 67.72 1.03
ROCm0 MUL_MAT type_a=q4_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.62 4.60 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.18 3.12 0.98
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.25 5.27 1.00
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.35 6.50 1.02
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.89 7.85 0.99
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.22 8.16 0.99
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 71.47 74.48 1.04
ROCm0 MUL_MAT type_a=q5_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.00 8.97 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.43 3.38 0.98
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.63 5.64 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.16 7.13 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.12 8.13 1.00
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.77 8.83 1.01
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 59.96 63.07 1.05
ROCm0 MUL_MAT type_a=q5_1,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 9.66 9.65 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.58 2.56 0.99
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.58 3.57 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.95 3.93 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.16 4.15 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.34 4.33 1.00
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 62.87 65.12 1.04
ROCm0 MUL_MAT type_a=q5_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.54 4.52 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 1.87 1.86 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.94 2.93 0.99
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 3.71 3.70 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.20 4.19 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.56 4.54 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 31.69 31.57 1.00
ROCm0 MUL_MAT type_a=q6_K,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 5.23 5.20 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 2.50 2.57 1.03
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 4.68 4.66 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=3,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 6.50 6.49 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=4,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.50 7.48 1.00
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=5,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 7.90 7.86 0.99
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 77.92 80.52 1.03
ROCm0 MUL_MAT type_a=q8_0,type_b=f32,m=4096,n=8,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1 8.15 8.14 1.00
MUL_MAT_ID
Backend GGML op Op parameters TFLOPS master TFLOPS refactor_mma_for_rdna Speedup
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.77 0.77 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 4.35 4.33 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 4.51 4.53 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.40 1.50 1.07
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.74 0.77 1.04
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 5.33 5.34 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.40 2.40 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.75 0.81 1.08
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.77 0.77 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 5.97 5.87 0.98
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 8.82 8.80 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 2.52 2.45 0.97
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.70 0.70 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 13.88 13.84 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 4.38 4.36 1.00
ROCm0 MUL_MAT_ID type_a=f16,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.98 1.08 1.10
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 0.68 0.66 0.98
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 0.63 0.66 1.04
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 0.90 0.90 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 0.29 0.28 0.98
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.14 0.13 0.94
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 1.65 1.64 0.99
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 0.49 0.50 1.03
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.17 0.17 1.00
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 0.72 0.72 1.01
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 1.60 1.59 0.99
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 2.74 2.72 0.99
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 0.85 0.92 1.08
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.28 0.27 0.94
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 2.52 2.42 0.96
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 1.10 1.06 0.96
ROCm0 MUL_MAT_ID type_a=f32,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 0.33 0.34 1.04
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.48 1.47 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 3.37 3.36 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 6.21 6.20 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.09 2.03 0.97
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 0.65 0.64 0.98
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 11.16 11.10 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.00 3.01 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 0.91 0.87 0.95
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.60 1.59 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 7.00 6.96 0.99
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 13.03 12.97 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.74 3.68 0.98
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 0.90 0.75 0.84
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 24.00 23.96 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 5.77 5.74 1.00
ROCm0 MUL_MAT_ID type_a=iq2_xs,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 1.41 1.17 0.83
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=1,k=2880 2.84 2.85 1.00
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=4,k=2880 2.17 2.08 0.96
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=512,k=2880 34.00 36.52 1.07
ROCm0 MUL_MAT_ID type_a=mxfp4,type_b=f32,n_mats=32,n_used=4,b=0,m=2880,n=8,k=2880 2.32 2.58 1.11
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.13 1.13 1.00
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 4.93 5.15 1.05
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 9.13 9.55 1.05
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.52 3.00 1.19
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.40 1.51 1.08
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 16.59 17.22 1.04
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.71 4.12 1.11
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.72 2.12 1.23
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 2.16 2.14 0.99
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 9.97 10.41 1.04
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 18.40 19.17 1.04
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 4.58 5.19 1.13
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.62 1.78 1.10
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 33.63 34.85 1.04
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 7.20 7.96 1.10
ROCm0 MUL_MAT_ID type_a=q4_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.81 2.46 0.87
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.02 1.02 1.00
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 4.25 4.38 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 7.87 8.15 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.93 3.14 1.07
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.72 1.78 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 14.40 14.83 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.73 3.96 1.06
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 2.02 2.26 1.12
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.80 1.79 0.99
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 8.58 8.85 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 15.79 16.32 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 5.36 5.65 1.05
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.82 2.52 1.38
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 29.08 29.93 1.03
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 7.28 7.73 1.06
ROCm0 MUL_MAT_ID type_a=q4_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.59 2.99 1.15
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.29 1.30 1.01
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 1.98 1.97 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 3.75 3.74 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 1.69 1.81 1.07
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.05 0.94 0.89
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 7.09 7.07 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 2.07 2.04 0.99
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.25 1.23 0.98
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.38 1.37 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 4.57 4.56 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 8.50 8.46 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.01 2.98 0.99
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.18 1.34 1.14
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 16.33 16.25 1.00
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 4.11 4.09 0.99
ROCm0 MUL_MAT_ID type_a=q6_K,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 1.65 1.51 0.91
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=1,k=2048 1.06 1.06 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=128,k=2048 4.81 4.94 1.03
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=256,k=2048 8.95 9.29 1.04
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=32,k=2048 2.42 2.43 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=4,k=2048 1.54 1.56 1.01
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=512,k=2048 16.16 16.87 1.04
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=64,k=2048 3.50 3.71 1.06
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=128,n_used=8,b=0,m=768,n=8,k=2048 1.55 1.60 1.03
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=1,k=2048 1.81 1.81 1.00
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=128,k=2048 9.52 9.85 1.03
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=256,k=2048 17.69 18.41 1.04
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=32,k=2048 3.95 4.12 1.04
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=4,k=2048 1.62 2.03 1.25
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=512,k=2048 32.42 33.70 1.04
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=64,k=2048 6.83 7.23 1.06
ROCm0 MUL_MAT_ID type_a=q8_0,type_b=f32,n_mats=32,n_used=4,b=0,m=1792,n=8,k=2048 2.28 2.02 0.89

@JohannesGaessler JohannesGaessler merged commit acec774 into ggml-org:master Dec 17, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: mma.cuh shall be refactored for AMD

2 participants