joshwonghc's Feed

🧠LLMs Blog

cswithsanjay.blogspot.com·

Training the Model Was Only 20% of the Job: Lessons from Building an MLOps Platform

🔧MLOps Blog

medium.com

LLM Observability: What To Instrument and How To Act on It

🔍LLM Tracing Blog

blog.n8n.io·

A Small RAG Evaluation Harness for Production-Oriented LLM Systems

💻AI Engineering Blog

itstedpark.medium.com·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

🌐Open Source AI

everylocalai.com··DEV

BeamWeaver - LangChain/LangGraph-style agents and workflows for Elixir

🤖AI Agents

elixirstatus.com·

Introducing the Third Generation of Apple’s Foundation Models

🧠LLMs 25

machinelearning.apple.com··Hacker News, r/apple·Cited by 25 articles

Building AI-Powered Applications with Spring Boot AI: A Practical Guide for Java Developers

💻AI Engineering Blog

medium.com

Detecting AI-specific threats in Claude Enterprise from the Compliance API: a prefilter + LLM-as-judge pipeline with Sigma rules

✍️Prompt Engineering

papermtn.co.uk··r/netsec

My prompt is better than your prompt – how to optimize your prompts in the age of agentic AI

🧠LLMs Blog

metrics.blogg.gu.se·

DiffusionGemma 26B A4B results on my 5090

🌐Open Source AI

huggingface.co··r/LocalLLaMA·Cited by 1 article

Guardian Runtime – Local firewall for AI coding agents and runaway costs

🤖AI Agents

pypi.org··Hacker News·Cited by 1 article

Quiz: Embeddings and Vector Databases With ChromaDB

📚RAG

realpython.com·

12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI

🌐Open Source AI Blog

medium.com

The Era of Multi-Agent Imagined Experience

🤖AI Agents

odyssey.ml··Hacker News

Why LLMs (still) lack taste

🧠LLMs

beyondtheprior.com··Hacker News

microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

💻AI Engineering Code

github.com··DEV

milvuslite-kit configuration over code for vector search and rag workflows

📚RAG Blog

elanthirayan.medium.com··DEV

high-performance classification API (beats GPT-5.4-mini)

🧠LLMs Discussion

classer.ai··Hacker News

joshwonghc's Feed

AI Agent Security Guide: How to Prevent Prompt Injection Attack

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

Training the Model Was Only 20% of the Job: Lessons from Building an MLOps Platform

LLM Observability: What To Instrument and How To Act on It

A Small RAG Evaluation Harness for Production-Oriented LLM Systems

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

BeamWeaver - LangChain/LangGraph-style agents and workflows for Elixir

Introducing the Third Generation of Apple’s Foundation Models

Building AI-Powered Applications with Spring Boot AI: A Practical Guide for Java Developers

Detecting AI-specific threats in Claude Enterprise from the Compliance API: a prefilter + LLM-as-judge pipeline with Sigma rules

My prompt is better than your prompt – how to optimize your prompts in the age of agentic AI

DiffusionGemma 26B A4B results on my 5090

Guardian Runtime – Local firewall for AI coding agents and runaway costs

Quiz: Embeddings and Vector Databases With ChromaDB

12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI

The Era of Multi-Agent Imagined Experience

Why LLMs (still) lack taste

microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

milvuslite-kit configuration over code for vector search and rag workflows

high-performance classification API (beats GPT-5.4-mini)