💬 LLMs - sk.naseerahmed

RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms

⚡Apache Spark Academic

arxiv.org·

What Is Generative AI?

🧠AI Engineering Academic

excelsior.edu·

Melanie Mitchell: What We Get Wrong About AI

🤖Machine Learning

yalereview.org··Substack, Hacker News, Hacker News

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

🧠AI Engineering Blog

medium.com·

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🔁MLOps Blog

dnhkng.github.io·

Claude vs GPT-4: Which AI API Is Better for Developers? (2026)

🧠AI Engineering

kalyna.pro··DEV

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🔗gRPC Code

github.com··r/LocalLLaMA

Alignment Defends LLMs from Property Inference Attacks

🔁MLOps Academic

arxiv.org·

Why Shrinking an AI Model Often Makes It More Useful

🧠AI Engineering

siliconopera.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🧠AI Engineering Blog

dnhkng.github.io·

I built an open-source persistent memory layer for AI coding agents

🔗gRPC Code

github.com··r/GithubCopilot

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

🔁MLOps Academic

arxiv.org·

AI Agents Running Businesses: Andon Labs on Project Vend

🧠AI Engineering

startuphub.ai·

I finally built the central AI hub I've been wanting, and Open WebUI made it stupidly simple

🧠AI Engineering

xda-developers.com·

ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.

🧠AI Engineering Code

github.com··Hacker News

Deep Learning Weekly: Issue 458

🤖Machine Learning

deeplearningweekly.com·

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

🤖Machine Learning Academic

arxiv.org·

Context Engineering vs. Prompt Engineering: Why Your AI Agent Gets Dumber the Longer It Runs

🧠AI Engineering Blog

medium.com

A handy llama-server launcher with easy model and configuration customisation

Here's a llama.cpp CLI Command builder.

RATrain: A Resource-Aware Training Runtime for Large Language Models on Bandwidth-Constrained Heterogeneous Supercomputing Platforms

What Is Generative AI?

Melanie Mitchell: What We Get Wrong About AI

How J.A.R.V.I.S. Became the Smartest Mind on Earth — What is an LLM?

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Claude vs GPT-4: Which AI API Is Better for Developers? (2026)

Alignment Defends LLMs from Property Inference Attacks

Why Shrinking an AI Model Often Makes It More Useful

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

I built an open-source persistent memory layer for AI coding agents

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

AI Agents Running Businesses: Andon Labs on Project Vend

I finally built the central AI hub I've been wanting, and Open WebUI made it stupidly simple

ashp15205/guardian-runtime: A zero-latency, local-first runtime firewall for LLMs. Intercept every prompt and response locally to stop data leaks and runaway token costs.

Deep Learning Weekly: Issue 458

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

Context Engineering vs. Prompt Engineering: Why Your AI Agent Gets Dumber the Longer It Runs