🦙 Llama - SeanNg

🤖LLM News Blog

braddelong.substack.com··Substack

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

🤖LLM Code

github.com··Hacker News

Using local LLMs for agentic coding

🤖LLM Blog

blog.alexewerlof.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

⚡Inference Optimization

local-llm.utop.workers.dev··Hacker News

Running LLM Inference on Kubernetes: What It Actually Takes

⚓Kubernetes Blog

fairwinds.com·

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

🎯Fine-tuning Academic

arxiv.org·

Burning 2.1M Tokens Version of Misadventures in Vibe-Programming: LAUGH OF THE DAY

🤖LLM

substackcdn.com··Substack

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

🤖Agent

codehamr.com··r/SideProject

vishal-dehurdle/state-harness: Runtime safety net for LLM agents. Detects token spirals, kills doomed tasks early, tells you exactly why. Rust core, Python SDK. pip install state-harness

🤖Agent Code

github.com··Hacker News

Unsloth Gemma 4 QAT

⚡Inference Optimization

unsloth.ai·

How to Measure Time To First Token (TTFT) in AI Systems

🧠OpenAI

qainsights.com··Hacker News

Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs

🤖LLM Academic

arxiv.org·

Anthropic Oceanus leaks 🤖, ChatGPT Dreaming 💭, recursive self improvement 🚀

🤖Agent

tldr.tech·

Creating ADK Agent using locally running Gemma 4

🤖LLM Blog

medium.com·

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

⚡Inference Optimization Code

github.com··Hacker News

When AI builds itself 👷, AI is not a line item 📝, local LLMs for agentic coding 🤖

🤖LLM

tldr.tech·

How to Train Your Goblin

🎮Reinforcement Learning

goblins.mchen.workers.dev··Hacker News, Hacker News

I built an open-source persistent memory layer for AI coding agents

🧠OpenAI Code

github.com··r/GithubCopilot

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

🤖LLM Academic

arxiv.org·

Running Ollama on a 15W CPU sounded ridiculous until I got it working with decent results

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

Using local LLMs for agentic coding

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

Running LLM Inference on Kubernetes: What It Actually Takes

How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions

Burning 2.1M Tokens Version of Misadventures in Vibe-Programming: LAUGH OF THE DAY

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

vishal-dehurdle/state-harness: Runtime safety net for LLM agents. Detects token spirals, kills doomed tasks early, tells you exactly why. Rust core, Python SDK. pip install state-harness

Unsloth Gemma 4 QAT

How to Measure Time To First Token (TTFT) in AI Systems

Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs

Anthropic Oceanus leaks 🤖, ChatGPT Dreaming 💭, recursive self improvement 🚀

Creating ADK Agent using locally running Gemma 4

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

When AI builds itself 👷, AI is not a line item 📝, local LLMs for agentic coding 🤖

How to Train Your Goblin

I built an open-source persistent memory layer for AI coding agents

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model