💬 LLMs - zhang

Less-relevant results

Machinic Psychopharmacology: Do LLMs Self-Medicate?

🔧AI Tools

lesswrong.com··Hacker News

The Sequence Knowledge #874: Transformers or Not?

🦾ROS

substackcdn.com··Substack

LLM Inference Engineering Room — Part 3: The Orchestration Layer

🔧AI Tools Blog

vimal-dwarampudi.medium.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🔧AI Tools News

newsletter.semianalysis.com

··Hacker News

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🔧AI Tools Code

github.com··DEV

Running LLM Inference on Kubernetes: What It Actually Takes

🔧AI Tools Blog

fairwinds.com·

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🔌Embedded Systems Blog

dnhkng.github.io·

Google's new open model DiffusionGemma generates text from noise instead of word by word

🔧AI Tools

the-decoder.com

Malicious Hugging Face Models Could Trigger Remote Code Execution

🔧AI Tools

techrepublic.com·

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🔧AI Tools Academic

arxiv.org·

How I benchmarked a 100% local RAG pipeline to 9/9 (zero API keys)

🔧AI Tools

buy.polar.sh··DEV

Hugging Face Transformers flaw enables RCE via malicious model configs

🔧AI Tools

4sysops.com·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🔧AI Tools

huggingface.co··Hacker News

I got a Crush on this new Terminal-based AI coding tool

🔧AI Tools

xda-developers.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🔧AI Tools

local-llm.utop.workers.dev··Hacker News

WWDC 2026: Foundation Models (& Anarlog)

🔧AI Tools

skushagra.com·

Analyzing the geometric dependence of thermoelastic Q -factor in micro hemispherical resonators via a data-augmented CNN-transformer model

🔌Embedded Systems Academic

nature.com·

16bit.com Update Summary: June 09, 2026

🔌Embedded Systems

16bit.com·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

Hugging Face Transformers RCE flaw enables stealthy compromise via AI model configs

Machinic Psychopharmacology: Do LLMs Self-Medicate?

The Sequence Knowledge #874: Transformers or Not?

LLM Inference Engineering Room — Part 3: The Orchestration Layer

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Running LLM Inference on Kubernetes: What It Actually Takes

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Google's new open model DiffusionGemma generates text from noise instead of word by word

Malicious Hugging Face Models Could Trigger Remote Code Execution

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

How I benchmarked a 100% local RAG pipeline to 9/9 (zero API keys)

Hugging Face Transformers flaw enables RCE via malicious model configs

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

I got a Crush on this new Terminal-based AI coding tool

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

WWDC 2026: Foundation Models (& Anarlog)

Analyzing the geometric dependence of thermoelastic Q -factor in micro hemispherical resonators via a data-augmented CNN-transformer model

16bit.com Update Summary: June 09, 2026