🤖 AI - micaleel · Scour

local llm on laptop 780M GPU using llama + gemma 4 qat

🤖Transformers Blog

alper.bearblog.dev·

Here's a llama.cpp CLI Command builder.

🤖Transformers

llamabuilding.com··r/LocalLLaMA

DiffusionGemma: The Developer Guide- Google Developers Blog

🤖Machine Learning Blog

developers.googleblog.com··r/LocalLLaMA

DiffusionGemma: 4x Faster Text Generation

🤖Machine Learning News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🤖Transformers Code

github.com··r/LocalLLaMA

Machinic Psychopharmacology: Do LLMs Self-Medicate?

🚀Model Serving

lesswrong.com··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🚀Model Serving News Blog

kaitchup.substack.com··r/LocalLLaMA

Qwen 3.6 27B AutoRound GGUF, need your feedback

🤖Transformers

huggingface.co··r/LocalLLaMA

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

🧠Deep Learning

smolhub.com··r/LocalLLaMA

Youssof Altoukhi (@Youssofal_)

🚀Model Serving

xcancel.com··r/LocalLLaMA

Homebrew, Again

🛠️Feature Engineering Blog

jerryz.bearblog.dev·

A drop-in replacement chat template for google/gemma-4-31B-it tuned for open-source agentic coding harnesses.

🐍Programming

gist.github.com··r/LocalLLaMA

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

lesswrong.com·

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

🦙Claude Code

github.com··r/LocalLLaMA

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

🚀Model Serving

huggingface.co··r/LocalLLaMA

Can activation verbalizers surface an internal chain of thought?

🤖Transformers

lesswrong.com·

How to reduce capability degradation from off-model SFT

🤖Machine Learning

lesswrong.com·

A handy llama-server launcher with easy model and configuration customisation

🤖Transformers Code

github.com··r/LocalLLaMA

OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.

🤖Machine Learning Blog

huggingface.co··Hacker News, r/LocalLLaMA

Defeating Introspection Adapters (and Why Threat Models Matter)

🤖Machine Learning

lesswrong.com·

Log in to enable infinite scrolling