💬 LLMs - yfff

Less-relevant results

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

✍️Prompt Engineering News

latent.space

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

🧠LLM Academic

arxiv.org·

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

🧠LLM Blog

huggingface.co·

See, Act, Correct: three levers for working with a code agent

🎮Reinforcement Learning Blog

blog.owulveryck.info··Hacker News, Hacker News

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

⌨️CLI Tools Blog

ziraph.com··Hacker News

Tokenminning: Because Tokenmaxxing Is a Bad Idea

✍️Prompt Engineering

tokenminning.com··Hacker News

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

⌨️CLI Tools

local-llm.utop.workers.dev··Hacker News

Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations

🦀Rust Blog

andlukyane.com··Hacker News

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🤖AI Academic

arxiv.org·

Stack Overflow didn't just help AI learn to code

🤖AI

zozo123.github.io··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🤖AI News Blog

blog.google··Hacker News

SafeRun: Enabling Determinism in LLM Planning for Running

💡AI Reasoning Academic

arxiv.org·

[AINews] not much happened today

🤖AI News

latent.space

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

💾Retro Computing Code

github.com··Hacker News

Analyzing the Correlation Between Hallucinations and Knowledge Conflicts in Large Language Models

🧠LLM Academic

arxiv.org·

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🤖AI News

digg.com··Hacker News

How to Train Your Goblin

🎮Reinforcement Learning

goblins.mchen.workers.dev··Hacker News, Hacker News

A retrieval conditioned rebinding circuit for dynamic entity tracking in large language models

🤖Transformers Academic

arxiv.org·

No more posts from yfff's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

Show HN: LLM memory without context bleed; 100% precision vs. <10% vector search

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

See, Act, Correct: three levers for working with a code agent

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

Tokenminning: Because Tokenmaxxing Is a Bad Idea

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

Stack Overflow didn't just help AI learn to code

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

SafeRun: Enabling Determinism in LLM Planning for Running

[AINews] not much happened today

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

Analyzing the Correlation Between Hallucinations and Knowledge Conflicts in Large Language Models

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

How to Train Your Goblin

A retrieval conditioned rebinding circuit for dynamic entity tracking in large language models