-
Notifications
You must be signed in to change notification settings - Fork 31.4k
fix: Initialize ApertusMLP's xielu activation using torch_dtype
#42864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
91f68a8 to
6730601
Compare
26667d3 to
1cb8b00
Compare
Initialize XIELU activation with correct dtype from config (using config.dtype instead of default bfloat16) to prevent promotion to float32 and subsequent crashes on Turing/float16 GPUs.
1cb8b00 to
4b41cf8
Compare
|
run-slow: apertus |
|
This comment contains models: ["models/apertus"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
The test runner is very slow so I checked this out locally and the slow tests seem fine, so I'm happy to approve this! |
|
Imagine adding another test for f16 precision crazy ahaha thanks! |
Rocketknight1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One nit: Can we move the import to the top of the file? We're already importing ACT2FN and ACT2CLS is in the same file, so there should be no performance penalty.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus |
|
Thanks for the quick iteration, merging! |
…2864) * Fix Apertus model crash on float16 hardware Initialize XIELU activation with correct dtype from config (using config.dtype instead of default bfloat16) to prevent promotion to float32 and subsequent crashes on Turing/float16 GPUs. * refactor: Move `ACT2CLS` import to top-level in Apertus models.
CI Results✅ No failing test specific to this PR 🎉 ! |
What does this PR do?
I cannot infer this model architecture on Turing. This fixes the issue at the root.
See #42371, vllm-project/vllm#29349, vllm-project/vllm#30635 and https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509/discussions/21 for context.
Fixes
RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::HalfIssue reproduction:
Output Without this patch:
Output with this patch:
Big thanks to everyone who helped to shape this patch into light.