🤖 LLMs - cyberpsych12

Boris Cherny, Claude Code creator, says he stopped manually prompting AI and now writes autonomous loops to orchestrate the model

✍️Prompt Engineering News

digg.com·

Price Drop: Save 90% on ChatPlayground AI lifetime plan, and compare multiple AI models

✍️Prompt Engineering

neowin.net·

Google's new open-weights model brings image-generation tricks to AI text generation

✍️Prompt Engineering News

theregister.com·

The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.

✍️Prompt Engineering

thenewstack.io·

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

🎮GPU Computing

uccl-project.github.io··Hacker News

MLPerf and the rise of latency-aware LLM benchmarking

📈Performance Engineering

edn.com·

harmansingh4163-ai/ESP-32-s3-Story-maker-LLM: 15M/42M-param Llama split across two ESP32-S3s over 3 wires — too big for either chip alone. INT4, flash mmap, bit-exact verified.

✍️Prompt Engineering Code

github.com··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

📈Performance Engineering Blog

blogs.nvidia.com·

Research Proposal: Decoupled RISC-LLM Architectures via Circadian Synaptic Consolidation

✍️Prompt Engineering

aermia.com··Hacker News

LLM Cheat Sheet

📈Performance Engineering Blog

drkpxl.bearblog.dev·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

⚡LLM Inference

zozo123.github.io··Hacker News

Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit

⚡LLM Inference

huggingface.co··r/LocalLLaMA

If LLMs are all persona, whose persona are they?

✍️Prompt Engineering

persona.earthpilot.ai··Hacker News

Report: GKE Inference Gateway delivers up to 92% faster AI responses

✍️Prompt Engineering Blog

cloud.google.com··Hacker News

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🎮GPU Computing

phoronix.com··r/artificial

New comment by alroma90 in "Ask HN: Who wants to be hired? (June 2026)"

✍️Prompt Engineering Discussion

news.ycombinator.com··Hacker News

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

✍️Prompt Engineering Blog

fitservers.com·

Everyone Was Searching for Better AI Prompts. Then One Markdown File Changed Everything

✍️Prompt Engineering Blog

medium.com

Claude Fable 5 is Mythos for the masses

high-performance classification API (beats GPT-5.4-mini)

Boris Cherny, Claude Code creator, says he stopped manually prompting AI and now writes autonomous loops to orchestrate the model

Price Drop: Save 90% on ChatPlayground AI lifetime plan, and compare multiple AI models

Google's new open-weights model brings image-generation tricks to AI text generation

The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

MLPerf and the rise of latency-aware LLM benchmarking

harmansingh4163-ai/ESP-32-s3-Story-maker-LLM: 15M/42M-param Llama split across two ESP32-S3s over 3 wires — too big for either chip alone. INT4, flash mmap, bit-exact verified.

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Research Proposal: Decoupled RISC-LLM Architectures via Circadian Synaptic Consolidation

LLM Cheat Sheet

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit

If LLMs are all persona, whose persona are they?

Report: GKE Inference Gateway delivers up to 92% faster AI responses

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

New comment by alroma90 in "Ask HN: Who wants to be hired? (June 2026)"

NVIDIA A100 vs RTX 4090 for AI Workloads: The Cost Per FLOP Reality

Everyone Was Searching for Better AI Prompts. Then One Markdown File Changed Everything