-
Notifications
You must be signed in to change notification settings - Fork 14.1k
model: add KORMo model #18032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: add KORMo model #18032
Conversation
|
It's not llama3, it looks like qwen3 to me. |
I checked qwen3 transformer implementation with KORMo's custom code, but Q/K Normalization does not apply on KORMo one.
... And in model's paper: "The base structure followed the Llama-3 series architecture" Not sure why they decided to use "KORMoForCausalLM" beside "LlamaForCausalLM" then.. |
Right, so probably qwen2 then.
It's a bit of an odd statement.
That is indeed the weirdest thing of all, there's nothing there warranting a new arch. |
|
I suggest trying to move this to qwen2, the pre-tokenizer certainly is qwen2, and the chat template is almost identical to qwen's as well... |
It doesn't even have sliding window attention(SWA) like Qwen2 as per it's "custom" modeling code. (which just look like copy & pasted from llama codes from what I see)
It seems they:
I'm getting confused. What do you think? |
They are not, LLaMA 3.1's regex differs slightly, KORMo is using Qwen's. |
As I said, try moving everything to qwen2. |
I'm sorry, you were right. PreTokenizers are different. My fault. Qwen3 and KORMo: LLaMA 3.1: I changed as you suggested. Add: |
I still think Qwen2 is correct, it just needs to be updated so that bias is actually optional, like in Qwen2MoE. In fact, just tested it and it works perfectly. Lines 3420 to 3423 in 5dbb758
llama.cpp/src/models/qwen2moe.cpp Lines 35 to 38 in 5dbb758
There is however something screwy with the chat template, not sure what's going on, but with Edit: Oh, I see, you submitted a fixed one, what a weird bug. :) |
Thanks for guiding me the right direction. I appreciate it. Now model happily runs with Qwen2 architecture, and Qwen 2.5 also works well.
Yes, it was me lol. They merged the fix today, so no more problems! |




Make sure to read the contributing guidelines before submitting a PR
Hello, This is my first contribution for llama.cpp.
This PR adds support for "KORMo-Team/KORMo-10B-sft". Trained from scratch with open Korean resources and more.
From what I understand, this model shares architecture with LLaMA 3 family but with different tokenizer and tensor name. I tested locally and it seems working well.
Let me know if I can improve this!
For test: https://huggingface.co/hell0ks/KORMo-10B-sft-gguf