fix: Initialize ApertusMLP's xielu activation using `torch_dtype` #42864

wasertech · 2025-12-14T16:31:17Z

What does this PR do?

I cannot infer this model architecture on Turing. This fixes the issue at the root.

See #42371, vllm-project/vllm#29349, vllm-project/vllm#30635 and https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509/discussions/21 for context.

Fixes RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::Half

Issue reproduction:

import torch
from torch import nn
from transformers.models.apertus.configuration_apertus import ApertusConfig
from transformers.models.apertus.modeling_apertus import ApertusMLP
import sys

def test_apertus_mlp_crash():
    print(f"Python executable: {sys.executable}")
    print("Testing ApertusMLP crash on float16...")
    
    # 1. Setup Config with float16 and xielu
    config = ApertusConfig(
        hidden_size=64,
        intermediate_size=128,
        hidden_act="xielu",
        torch_dtype="float16" # Simulating the runner or user choice
    )
    
    # 2. Instantiate MLP
    # We might need to manually set default dtype if the model doesn't pick it up from config immediately
    # usage in vllm often sets the torch default dtype or casts the model
    torch.set_default_dtype(torch.float16) # Simulating "float16 hardware" env
    
    try:
        mlp = ApertusMLP(config)
        
        # Check act_fn dtype
        if hasattr(mlp.act_fn, 'beta'):
             print(f"XIELU beta dtype: {mlp.act_fn.beta.dtype}")
        
        # 3. Create Input
        x = torch.randn(1, 64, dtype=torch.float16)
        
        # 4. Forward Pass
        output = mlp(x)
        print("Forward pass successful!")
        print(f"Output dtype: {output.dtype}")
        
    except RuntimeError as e:
        print("\nCaught expected RuntimeError:")
        print(e)
    except Exception as e:
        print(f"\nCaught unexpected exception: {type(e)}")
        print(e)
    finally:
         torch.set_default_dtype(torch.float32) # Cleanup

if __name__ == "__main__":
    test_apertus_mlp_crash()

Output Without this patch:

Python executable: /home/waser/Projets/Transformers/transformers/venv/bin/python
Testing ApertusMLP crash on float16...
CUDA-fused xIELU not available (No module named 'xielu') – falling back to a Python version.
For CUDA xIELU (experimental), `pip install git+https://github.com/nickjbrowning/XIELU`
XIELU beta dtype: torch.bfloat16

Caught expected RuntimeError:
expected m1 and m2 to have the same dtype, but got: float != c10::Half

Output with this patch:

Python executable: /home/waser/Projets/Transformers/transformers/venv/bin/python
Testing ApertusMLP crash on float16...
CUDA-fused xIELU not available (No module named 'xielu') – falling back to a Python version.
For CUDA xIELU (experimental), `pip install git+https://github.com/nickjbrowning/XIELU`
`torch_dtype` is deprecated! Use `dtype` instead!
XIELU beta dtype: torch.float16
Forward pass successful!
Output dtype: torch.float16

Big thanks to everyone who helped to shape this patch into light.

Initialize XIELU activation with correct dtype from config (using config.dtype instead of default bfloat16) to prevent promotion to float32 and subsequent crashes on Turing/float16 GPUs.

Rocketknight1 · 2025-12-15T14:17:11Z

run-slow: apertus

github-actions · 2025-12-15T14:18:21Z

This comment contains run-slow, running the specified jobs:

models: ["models/apertus"]
quantizations: []

HuggingFaceDocBuilderDev · 2025-12-15T14:26:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-12-15T15:51:25Z

The test runner is very slow so I checked this out locally and the slow tests seem fine, so I'm happy to approve this!

wasertech · 2025-12-15T15:52:19Z

Imagine adding another test for f16 precision crazy ahaha thanks!

Rocketknight1

LGTM! One nit: Can we move the import to the top of the file? We're already importing ACT2FN and ACT2CLS is in the same file, so there should be no performance penalty.

github-actions · 2025-12-15T15:59:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus

Rocketknight1 · 2025-12-15T16:07:11Z

Thanks for the quick iteration, merging!

…2864) * Fix Apertus model crash on float16 hardware Initialize XIELU activation with correct dtype from config (using config.dtype instead of default bfloat16) to prevent promotion to float32 and subsequent crashes on Turing/float16 GPUs. * refactor: Move `ACT2CLS` import to top-level in Apertus models.

github-actions · 2025-12-15T22:54:06Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

wasertech force-pushed the apertus-turing branch from 91f68a8 to 6730601 Compare December 14, 2025 17:08

This comment was marked as resolved.

Sign in to view

wasertech force-pushed the apertus-turing branch 2 times, most recently from 26667d3 to 1cb8b00 Compare December 14, 2025 18:11

Fix Apertus model crash on float16 hardware

4b41cf8

Initialize XIELU activation with correct dtype from config (using config.dtype instead of default bfloat16) to prevent promotion to float32 and subsequent crashes on Turing/float16 GPUs.

wasertech force-pushed the apertus-turing branch from 1cb8b00 to 4b41cf8 Compare December 14, 2025 18:16

Rocketknight1 approved these changes Dec 15, 2025

View reviewed changes

refactor: Move ACT2CLS import to top-level in Apertus models.

a660db5

Rocketknight1 enabled auto-merge (squash) December 15, 2025 16:07

Rocketknight1 merged commit 06378d4 into huggingface:main Dec 15, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Initialize ApertusMLP's xielu activation using `torch_dtype` #42864

fix: Initialize ApertusMLP's xielu activation using `torch_dtype` #42864

wasertech commented Dec 14, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 15, 2025

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

wasertech commented Dec 15, 2025

Uh oh!

Rocketknight1 left a comment

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Initialize ApertusMLP's xielu activation using torch_dtype #42864

fix: Initialize ApertusMLP's xielu activation using torch_dtype #42864

Conversation

wasertech commented Dec 14, 2025

What does this PR do?

Uh oh!

This comment was marked as resolved.

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 15, 2025

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

wasertech commented Dec 15, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

Rocketknight1 commented Dec 15, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 15, 2025

CI Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Initialize ApertusMLP's xielu activation using `torch_dtype` #42864

fix: Initialize ApertusMLP's xielu activation using `torch_dtype` #42864