Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
21ac639
working draft for LongCat
molbap Sep 5, 2025
c939eb2
BC changes to deepseek_v3 for modular
molbap Sep 5, 2025
2535c28
format
molbap Sep 8, 2025
bac973f
Merge branch 'main' into new_moe
molbap Sep 8, 2025
cddaba5
various modularities
molbap Sep 8, 2025
67943a4
better tp plan
molbap Sep 8, 2025
d765b18
better init
molbap Sep 8, 2025
eebb41c
minor changes
molbap Sep 8, 2025
414ba61
make modular better
molbap Sep 8, 2025
7586dd7
clean up patterns
molbap Sep 8, 2025
b4584ad
Revert a couple of modular commits, because we won't convert in the end
molbap Sep 9, 2025
76e4555
make things explicit.
molbap Sep 9, 2025
c7c5a3d
draft test
molbap Sep 9, 2025
6e58487
toctree, tests and imports
molbap Sep 9, 2025
8bb172d
drop
molbap Sep 9, 2025
726828d
woops
molbap Sep 9, 2025
df11c0e
make better things
molbap Sep 9, 2025
fa3aacf
update test
molbap Sep 9, 2025
07af563
update
molbap Sep 9, 2025
927a55e
fixes
molbap Sep 9, 2025
36c3dbb
style and CI
molbap Sep 9, 2025
d85c3e3
convert stuff
molbap Sep 9, 2025
8cb4dc2
up
molbap Sep 9, 2025
1343b65
ah, yes, that
molbap Sep 9, 2025
275374a
enable gen tests
molbap Sep 10, 2025
f9d35c5
fix cache shape in test (sum of 2 things)
molbap Sep 10, 2025
74d2728
fix tests
molbap Sep 10, 2025
1c9b49f
comments
molbap Sep 10, 2025
967259a
re-Identitise
molbap Sep 10, 2025
da61426
minimize changes
molbap Sep 11, 2025
9ff6f95
better defaults
molbap Sep 11, 2025
d75311c
modular betterment
molbap Sep 11, 2025
87b5687
fix configuration, add documentation
molbap Sep 11, 2025
e39779d
fix init
molbap Sep 11, 2025
c85a7ea
add integration tests
molbap Sep 12, 2025
3846289
add info
molbap Sep 12, 2025
1ec96f4
simplify
molbap Sep 12, 2025
6778512
update slow tests
molbap Sep 12, 2025
88e3114
fix
molbap Sep 12, 2025
563f9e0
conflicted
molbap Sep 12, 2025
67fd0d1
style
molbap Sep 12, 2025
ae5fcbc
Merge branch 'main' into new_moe
molbap Sep 12, 2025
c85afdd
Merge branch 'new_moe' of github.com:huggingface/transformers into ne…
molbap Sep 12, 2025
f208aa4
some additional long tests
molbap Sep 12, 2025
a3be847
cpu-only long test
molbap Sep 12, 2025
cf09a0b
Merge branch 'main' into new_moe
molbap Sep 12, 2025
c0f965f
fix last tests?
molbap Sep 12, 2025
2a76079
Merge branch 'new_moe' of github.com:huggingface/transformers into ne…
molbap Sep 12, 2025
7dafc04
urg
molbap Sep 12, 2025
7910e57
cleaner tests why not
molbap Sep 15, 2025
0666611
fix
molbap Sep 15, 2025
fd6df4f
Merge branch 'main' into new_moe
molbap Sep 15, 2025
a9b040e
improve slow tests, no skip
molbap Sep 16, 2025
b95af0a
style
molbap Sep 16, 2025
f0dfec7
don't upcast
molbap Sep 16, 2025
8463c5b
Merge branch 'main' into new_moe
molbap Sep 16, 2025
8cd2bb4
one skip
molbap Sep 16, 2025
68943ca
Merge branch 'new_moe' of github.com:huggingface/transformers into ne…
molbap Sep 16, 2025
f0eb7af
Merge branch 'main' into new_moe
molbap Sep 16, 2025
c85b064
finally fix parallelism
molbap Sep 16, 2025
f385373
Merge branch 'new_moe' of github.com:huggingface/transformers into ne…
molbap Sep 16, 2025
66b414a
Merge branch 'main' into new_moe
molbap Sep 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
don't upcast
  • Loading branch information
molbap committed Sep 16, 2025
commit f0dfec7e8aea30e656add5c93424cb05c67b8e75
5 changes: 1 addition & 4 deletions tests/models/longcat_flash/test_modeling_longcat_flash.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,10 +404,7 @@ def test_flash_attn_2_fp32_ln(self):
device_map="auto", # small change to ensure device placement
)

for _, param in model.named_parameters():
# upcast only layer norms
if (param.dtype == torch.float16) or (param.dtype == torch.bfloat16):
param.data = param.data.to(torch.float32)
# no upcasting at all

if model.config.is_encoder_decoder:
dummy_decoder_input_ids = inputs_dict["decoder_input_ids"]
Expand Down
Loading