📦 Batch Embeddings - emschwartz · Scour

BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization

arxiv.org·14h

🏗️LLM Infrastructure

Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching

arxiv.org·14h

📱Edge AI Optimization

AI Inference Needs A Mix-And-Match Memory Strategy

semiengineering.com·10h

🏗️LLM Infrastructure

EyesOff: Why Some Models Quantize Better Than Others

ym2132.github.io·20h·

Discuss: Hacker News

borodark/exmc: Probabilistic programming in BEAM

github.com·22h

🏗️LLM Infrastructure

Introducing Dedicated Container Inference: Delivering 2.6x faster inference for custom AI models

together.ai·19h

🏗️LLM Infrastructure

Show HN: A segmentation model client-side via WASM

qtoolkit.dev·5h·

Discuss: Hacker News

🏗️LLM Infrastructure

New Ovis2.6-30B-A3B, a lil better than Qwen3-VL-30B-A3B

huggingface.co·6h·

Discuss: r/LocalLLaMA

Supercharging Inference for AI Factories: KV Cache Offload as a Memory-Hierarchy Problem

blog.min.io·4h

🏗️LLM Infrastructure

Memgraph 3.8 is Out: Atomic GraphRAG + Vector Single Store With Major Performance Upgrades

memgraph.com·1h·

Discuss: Hacker News

🏹Apache Arrow

UbiquitousLearning/mllm: Fast Multimodal LLM on Mobile Devices

github.com·9h

🏗️LLM Infrastructure

NVIDIA DGX Spark Powers Big Projects in Higher Education

blogs.nvidia.com·4h

Training Data from Real-World Sources

lightningrod.ai·20h

Generate AI Infographics: The Ultimate Prompt Guide

cybercorsairs.com·4h

Training-Free Real-Time Control for Autoregressive Video Generation

daydream.live·4h·

Discuss: Hacker News

🎖Text Quality Models

AI learns to perform analog layout design

techxplore.com·21h

📱Edge AI Optimization

A training principle for drifting models

breno.bearblog.dev·7h

🧠LLM Inference

AI-built maps reveal causal gene regulation across Alzheimer's brain cell types

medicalxpress.com·1h

📊IVF Indexes

Ming-flash-omni-2.0: 100B MoE (6B active) omni-modal model - unified speech/SFX/music generation

huggingface.co·1h·

Discuss: r/LocalLLaMA

LAI #114: The Real Work of Production AI

pub.towardsai.net·4h

Loading more...