model: add Solar Open model by HelloKS · Pull Request #18511

Copy link

Contributor

Make sure to read the contributing guidelines before submitting a PR

This PR add support for Solar-Open-100B, released by Korean startup, Upstage.

Model identifier: SolarOpenForCausalLM

It’s basically GLM-4 MoE architecture, but without some feature (like MTP, bias)

Current status: Basic chat works. (Reasoning effort and Tooling does NOT work yet)

TODO

Investigate chat template, or reasoning parser

Closed

4 tasks

Copy link

Collaborator

Architecturally, it’s literally GLM4.5-Air with num_nextn_predict_layers equal to 0.

Template is annoying, it’s semi-xml with inlined function names and JSON arguments, k…

Copy link

Contributor

Make sure to read the contributing guidelines before submitting a PR

This PR add support for Solar-Open-100B, released by Korean startup, Upstage.

Model identifier: SolarOpenForCausalLM

It’s basically GLM-4 MoE architecture, but without some feature (like MTP, bias)

Current status: Basic chat works. (Reasoning effort and Tooling does NOT work yet)

TODO

Investigate chat template, or reasoning parser

Closed

4 tasks

Copy link

Collaborator

Architecturally, it’s literally GLM4.5-Air with num_nextn_predict_layers equal to 0.

Template is annoying, it’s semi-xml with inlined function names and JSON arguments, kind of like Ministral but with an added tool_call_id.

Copy link

Contributor Author

From my understanding, llama.cpp appends "<|end|>" token as EOG token for some reason and this causes to model to stop after reasoning. (Same as gpt-oss).

Finding workaround for this. Done.

Copy link

Collaborator

Template is annoying, it’s semi-xml with inlined function names and JSON arguments, kind of like Ministral but with an added tool_call_id.

It’s not too bad, should be easy to parse. Luckily, it seems to emit JSON arguments directly.

I wouldn’t call it XML. Like gpt-oss, it appears designed for stream parsing which is nice.

Copy link

Contributor Author

@LETS-BEE

Unfortunately, I encountered a core dump error when testing after converting to Q4_K_M.

My environment is as follows: CPU: Ryzen 9900X RAM: 128GB DDR5 GPU: AMD 7900XTX * 4 ea

I don’t think this PR caused segmentation fault, as this doesn’t changed any tensors or ops. (it’s almost just wrappers)

Check if other models working on latest commit first.

I’m working on Cuda machine so I can’t really tell.

Copy link

Contributor Author

Now basic chatting will work with proper reasoning boxes (hopefully).

btw I haven’t experienced streaming tool calling yet so it might take some time to implement.

Copy link

Collaborator

btw I haven’t experienced streaming tool calling yet so it might take some time to implement.

@HelloKS That should probably be a separate PR. I can provide the implementation.

Copy link

Contributor Author

Yes, I should request review then. Thanks for the advice.

HelloKS marked this pull request as ready for review

December 31, 2025 20:24

HelloKS changed the title [WIP] model: add Solar-Open model model: add Solar Open model

Dec 31, 2025

Copy link

Contributor

Architecturally, it’s literally GLM4.5-Air with num_nextn_predict_layers equal to 0.

it took me a while to understand your comment on reddit about it :)

Copy link

Contributor

Ah, I’m so sorry. This is actually quite embarrassing. It turned out to be a basic permission issue that prevented VRAM allocation. I am sorry for the confusion caused by such a simple oversight.

Here is a screenshot of it working properly.

Copy link

Contributor Author

I have kinda-working prototype for tooling, made with Gemini in solar-open-tools branch. You can try this as well if you really in hurry (for what? idk)

While this seems work, I respect contribution guideline so won’t PR for this.

Copy link

Contributor

** CISC ** approved these changes Jan 1, 2026

Copy link

Collaborator

@CISC ** CISC ** left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments on parser. Please use scripts/jinja/jinja-tester.py to generate messages with and without reasoning content and compare.

Copy link

Collaborator

Needs a basic implementation in src/llama-chat.{cpp,h} to handle when --jinja is not passed in. It currently crashes with llama-completion sans --jinja. You can use the OPENAI_MOE aka gpt-oss one as an example, they’re very similar.

Copy link

Contributor Author

Needs a basic implementation in src/llama-chat.{cpp,h} to handle when --jinja is not passed in. It currently crashes with llama-completion sans --jinja. You can use the OPENAI_MOE aka gpt-oss one as an example, they’re very similar.

I added basic chat template in llama-chat, and tested with llama-completion.

Little problem is, it shows <|end|>assistant at the end of reasoning. Is there any way to implement hiding this?

Copy link

Collaborator

@aldehir ** aldehir ** left a comment •

edited

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little problem is, it shows <|end|>assistant at the end of reasoning. Is there any way to implement hiding this?

It’s pretty much the same problem as gpt-oss. You can’t really fix this, because it requires always parsing the output. I think it’s fine, most users will use llama-cli or the --jinja flag anyway. So long as it doesn’t crash.

Looks good to me! Thanks 😊

Copy link

Collaborator

Needs a basic implementation in src/llama-chat.{cpp,h} to handle when --jinja is not passed in. It currently crashes with llama-completion sans --jinja.

Not strictly a prerequisite anymore, we have several supported models without basic chat support.

Copy link

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

CISC linked an issue

Jan 1, 2026

that may be closed by this pull request

Closed

4 tasks

TODO

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Similar Posts