-
Notifications
You must be signed in to change notification settings - Fork 14.1k
HIP: Refactor mma for RDNA and CDNA #17990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP: Refactor mma for RDNA and CDNA #17990
Conversation
JohannesGaessler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't have MFMA hardware for development I would suggest that you simply don't touch the corresponding code for now.
| DATA_LAYOUT_J_MAJOR = 10, // Matrix C for CDNA and RDNA4, int and float matrix C for RDNA3. | ||
| DATA_LAYOUT_I_MAJOR_MIRRORED = 20, | ||
| DATA_LAYOUT_J_MAJOR_MIRRORED = 30, | ||
| DATA_LAYOUT_I_MAJOR_DUAL = 40, // Matrix A&B for RDNA3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you're not using I_MAJOR_MIRRORED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a check in I_MAJOR_MIRRORED, ne = I * J / (WARP_SIZE/4), so it's for volta 8x8 gemm, so I add I_MAJOR_DUAL to handle RDNA3 problems, I don't think that mixing volta and rdna3 codes is a good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about what is a good choice, it's about what is the least bad choice. For this PR it's fine to add an extra value to the enum but I will refactor this to instead use either I_MAJOR or I_MAJOR_MIRRORED at some later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I_MAJOR for RDNA3 matrix A&B is the worst choice, I_MAJOR is only for RDNA3 matrix C not A&B, or you can only judge A&B or C by the shape, this is the current way is doing.
It can be moved to I_MAJOR_MIRRORED if you think mixing Volta and RDNA3 is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a check of I_MAJOR_MIRRORED, it only has half2 support, sorry I really not able to cover Volta, keeping I_MAJOR_DUAL and merging I_MAJOR_MIRRORED and I_MAJOR_DUAL from your side in the future is a better choice.
|
Honestly, as the refactor changes too much code, keeping the old path of MFMA still needs full test on CDNA, so I think it's worth to have a try to make the code correct first. |
|
If you want to get this PR merged in any reasonable time frame, you either need to fix MFMA yourself or you need to not touch it. I currently have other priorities and don't have the time to fix the MFMA part for you. |
I agree, I also don't want to touch MFMA part as I've been spending more than one month to acquire a MI308 but there is still no good response, I'm not sure if I'm able to get one. Anyway, could you help to run a quick test of MUL_MAT on your CDNA then I can decide how to move forward? Thank you. But, even not touch MFMA way will still modify the code of MFMA in mmq, it still need your help to do test, thank you. |
|
I'm willing to give you SSH access for development purposes but the machine with the MI100 would only be running during the daytime in Germany since it's in my living space and very loud. |
|
Thank you for the help, inf is not a good signal as it loads wrong data, let me revert CDNA part first then wait for AMD's response for a while to see if I'm able to access a CDNA3. |
|
Hello @JohannesGaessler I just give up acquiring MI308 from my company internal cloud, I'm acquiring MI300 at https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html, but as I'm in China, I'm not sure if I can access MI300 successfully. I just fixed a potential wrong ne problem in J_MAJOR for CDNA, could you help to have a quick test on your MI GPU? If it still doesn't work, could you share a ssh connection to me if possible, time difference between China and Germany isn't unacceptable, I just need to work from 4PM to 9PM, hopefully I'm able to connect it, thank you. Best Regards |
|
In any case, if you still want hardware access for development send me an email with your public key and your desired username. |
Thank you for the support, I think all my changes have been done, please help to review it first, the only remaining thing shall be merging I_MAJOR_MIRRORED and I_MAJOR_DUAL, but currently I still cannot find a good way to handle volta as volta only has half2 and RDNA3 has half2 and int for mmq. And thank you for providing the access of your MI100, since it's passed, I think I don't need it anymore, I will be very appreciated if AMD can give me an access of CDNA3. |
|
Please check the editorconfig CI job for trailing whitespaces. |
|
Attach the result of mul_mat on RDNA4. MUL_MAT
MUL_MAT_ID
|
Refactor mma.cuh for RDNA and CDNA, clean up row-major and colum-major matrix for future development like FA, add dual matrix type for RDNA3.
CDNA isn't tested as I don't have a GPU, @JohannesGaessler could you help to do a raw test on your MI GPU? Thank you. Honestly, I probably need your coding help to fix the bug on CDNA as I don't have a GPU, thank you.
Resolves #17856