Copy link
Contributor
Make sure to read the contributing guidelines before submitting a PR
This PR add support for Solar-Open-100B, released by Korean startup, Upstage.
Model identifier: SolarOpenForCausalLM
It’s basically GLM-4 MoE architecture, but without some feature (like MTP, bias)
Current status: Basic chat works. (Reasoning effort and Tooling does NOT work yet)
TODO
- Investigate chat template, or reasoning parser
Closed
4 tasks
Copy link
Collaborator
Architecturally, it’s literally GLM4.5-Air with num_nextn_predict_layers equal to 0.
Template is annoying, it’s semi-xml with inlined function names and JSON arguments, k…
Copy link
Contributor
Make sure to read the contributing guidelines before submitting a PR
This PR add support for Solar-Open-100B, released by Korean startup, Upstage.
Model identifier: SolarOpenForCausalLM
It’s basically GLM-4 MoE architecture, but without some feature (like MTP, bias)
Current status: Basic chat works. (Reasoning effort and Tooling does NOT work yet)
TODO
- Investigate chat template, or reasoning parser
Closed
4 tasks
Copy link
Collaborator
Architecturally, it’s literally GLM4.5-Air with num_nextn_predict_layers equal to 0.
Template is annoying, it’s semi-xml with inlined function names and JSON arguments, kind of like Ministral but with an added tool_call_id.
Copy link
Contributor Author
From my understanding, llama.cpp appends "<|end|>" token as EOG token for some reason and this causes to model to stop after reasoning. (Same as gpt-oss).
Finding workaround for this. Done.
Copy link
Collaborator
Template is annoying, it’s semi-xml with inlined function names and JSON arguments, kind of like Ministral but with an added
tool_call_id.
It’s not too bad, should be easy to parse. Luckily, it seems to emit JSON arguments directly.
I wouldn’t call it XML. Like gpt-oss, it appears designed for stream parsing which is nice.
Copy link
Contributor Author
Unfortunately, I encountered a core dump error when testing after converting to Q4_K_M.
My environment is as follows: CPU: Ryzen 9900X RAM: 128GB DDR5 GPU: AMD 7900XTX * 4 ea
I don’t think this PR caused segmentation fault, as this doesn’t changed any tensors or ops. (it’s almost just wrappers)
Check if other models working on latest commit first.
I’m working on Cuda machine so I can’t really tell.
Copy link
Contributor Author
Now basic chatting will work with proper reasoning boxes (hopefully).
btw I haven’t experienced streaming tool calling yet so it might take some time to implement.
Copy link
Collaborator
btw I haven’t experienced streaming tool calling yet so it might take some time to implement.
@HelloKS That should probably be a separate PR. I can provide the implementation.
Copy link
Contributor Author
Yes, I should request review then. Thanks for the advice.
HelloKS marked this pull request as ready for review
HelloKS changed the title [WIP] model: add Solar-Open model model: add Solar Open model
Copy link
Contributor
Architecturally, it’s literally GLM4.5-Air with
num_nextn_predict_layersequal to 0.
it took me a while to understand your comment on reddit about it :)
Copy link
Contributor
Ah, I’m so sorry. This is actually quite embarrassing. It turned out to be a basic permission issue that prevented VRAM allocation. I am sorry for the confusion caused by such a simple oversight.
Here is a screenshot of it working properly.
Copy link
Contributor Author
I have kinda-working prototype for tooling, made with Gemini in solar-open-tools branch. You can try this as well if you really in hurry (for what? idk)
While this seems work, I respect contribution guideline so won’t PR for this.
Copy link
Contributor
** CISC ** approved these changes Jan 1, 2026
Copy link
Collaborator
** CISC ** left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy link
Collaborator
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on parser. Please use scripts/jinja/jinja-tester.py to generate messages with and without reasoning content and compare.
Copy link
Collaborator
Needs a basic implementation in src/llama-chat.{cpp,h} to handle when --jinja is not passed in. It currently crashes with llama-completion sans --jinja. You can use the OPENAI_MOE aka gpt-oss one as an example, they’re very similar.
Copy link
Contributor Author
Needs a basic implementation in
src/llama-chat.{cpp,h}to handle when--jinjais not passed in. It currently crashes withllama-completionsans--jinja. You can use theOPENAI_MOEaka gpt-oss one as an example, they’re very similar.
I added basic chat template in llama-chat, and tested with llama-completion.
Little problem is, it shows <|end|>assistant at the end of reasoning. Is there any way to implement hiding this?
Copy link
Collaborator
** aldehir ** left a comment •
edited
Loading
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little problem is, it shows <|end|>assistant at the end of reasoning. Is there any way to implement hiding this?
It’s pretty much the same problem as gpt-oss. You can’t really fix this, because it requires always parsing the output. I think it’s fine, most users will use llama-cli or the --jinja flag anyway. So long as it doesn’t crash.
Looks good to me! Thanks 😊
Copy link
Collaborator
Needs a basic implementation in
src/llama-chat.{cpp,h}to handle when--jinjais not passed in. It currently crashes withllama-completionsans--jinja.
Not strictly a prerequisite anymore, we have several supported models without basic chat support.
Copy link
Collaborator
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
CISC linked an issue
that may be closed by this pull request
Closed
4 tasks