Fix SDXL VAE decode latents dtype mismatch on non-MPS #12847

tonera · 2025-12-16T12:45:59Z

Summary

On non-MPS platforms (CUDA/CPU), StableDiffusionXLPipeline can call vae.decode() with fp16 latents while the VAE (or parts of it) are fp32, which causes a hard runtime error in normalization/linear layers (e.g. GroupNorm): expected scalar type Half but found Float.

This happens because the pipeline currently only aligns latents dtype when needs_upcasting is True (fp16 VAE + force_upcast), and the elif latents.dtype != self.vae.dtype: branch only handles MPS by casting the VAE to the latents dtype. On CUDA/CPU there is no dtype/device alignment, so mixed dtype can reach VAE decode.

Reproduction

Environment: diffusers==0.36.0.dev0 (observed), CUDA or CPU (non-MPS)
Steps (conceptual):
1. Ensure VAE is fp32 (or has fp32 submodules) while latents become fp16.
2. Run StableDiffusionXLPipeline.__call__ with output_type != "latent".
3. Pipeline reaches vae.decode(latents, ...) and errors inside VAE decoder GroupNorm/Linear due to fp16 input + fp32 weights.

A concrete regression test is included to reproduce this without GPU:

Force pipe.vae to fp32
Use callback_on_step_end to force latents to fp16
Assert that the pipeline aligns the dtype back to fp32 before calling vae.decode

Fix

When needs_upcasting is False but latents.dtype != self.vae.dtype, we now align latents dtype/device to the VAE decode dtype/device (preferring vae.post_quant_conv parameters when available) on non-MPS platforms. This prevents mixed dtype from reaching vae.decode() and matches the intent of the upcast path.

Tests

Added test_vae_decode_aligns_latents_dtype_when_vae_is_fp32 in tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl.py.

Why this is a bug

Users can legitimately end up with fp32 VAE (stability) while latents are fp16 (performance / callbacks / schedulers). The pipeline should not crash with dtype mismatch in this scenario; it should deterministically align latents to the VAE decode dtype.

asomoza · 2025-12-16T14:24:33Z

HI, just in case, we are deprecating the upcasting of the VAE with this PR.

Having said that, I have a couple of questions:

why would someone just load the vae with fp32 and the rest of the pipeline in fp16?
If someone knows and does the extra step to use the vae with fp32, won't that person also know how to ensure the latents are also in fp32?
is this something that doesn't happen in other pipelines, for example Flux?

tonera · 2025-12-17T01:43:56Z

Thanks for the context and the note about deprecating VAE upcasting 1.
To clarify: the issue here is not about users intentionally running “fp32 VAE + fp16 pipeline” as a manual setup. The crash can happen whenever the VAE (or parts of it) are fp32 (which can occur for stability reasons / partial fp32 modules) while latents end up fp16 (e.g. via callback_on_step_end, scheduler/hook behavior, or external integrations). In that case, on non‑MPS platforms the current branch
elif latents.dtype != self.vae.dtype: ...
does not align anything, so mixed dtypes can reach vae.decode() and fail inside VAE decoder GroupNorm/Linear with Half/Float mismatch.
This PR adds a minimal, deterministic safety alignment at the decode boundary only when there is a mismatch, by casting latents to the VAE decode dtype/device (preferring post_quant_conv params, consistent with the existing needs_upcasting path). It doesn’t change behavior for the common case where dtypes already match.
The included regression test reproduces the scenario without GPU by forcing VAE fp32 and forcing latents fp16 via callback, then asserting that vae.decode() receives fp32 latents.

Fix SDXL VAE decode latents dtype mismatch on non-MPS

e3e9223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SDXL VAE decode latents dtype mismatch on non-MPS #12847

Fix SDXL VAE decode latents dtype mismatch on non-MPS #12847

tonera commented Dec 16, 2025

Uh oh!

asomoza commented Dec 16, 2025

Uh oh!

tonera commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix SDXL VAE decode latents dtype mismatch on non-MPS #12847

Are you sure you want to change the base?

Fix SDXL VAE decode latents dtype mismatch on non-MPS #12847

Conversation

tonera commented Dec 16, 2025

Summary

Reproduction

Fix

Tests

Why this is a bug

Uh oh!

asomoza commented Dec 16, 2025

Uh oh!

tonera commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants