lightmetal: GPU LLM Inference From a Single Java 25 JAR (opens in new tab)
GPU LLM inference on Apple Silicon, packaged as one Java 25 executable JAR, zero dependencies. lightmetal binds a Metal-enabled libllama.dylib through the Foreign Function & Memory API and runs Mistral- and Gemma-architecture GGUF models locally.
Read the original article