📦 Batch Embeddings - emschwartz · Scour

baidu/ERNIE-4.5-VL-28B-A3B-Thinking released. Curious case..

huggingface.co·7h·

Discuss: r/LocalLLaMA

🏗️LLM Infrastructure

Flag this post

The Underwear Fixed Point

notes.hella.cheap·18h·

Discuss: Lobsters, Hacker News, Hacker News

Flag this post

Fast and Affordable LLMs serving on Intel Arc Pro B-Series GPUs with vLLM

blog.vllm.ai·12h

🏗️LLM Infrastructure

Flag this post

Nested Learning: How Your Neural Network Already Learns at Multiple Timescales

rewire.it·18h

🧠LLM Inference

Flag this post

Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

engineering.fb.com·19h

📊Feed Optimization

Flag this post

Think SMART: New NVIDIA Dynamo Integrations Simplify AI Inference at Data Center Scale

blogs.nvidia.com·22h

🏗️LLM Infrastructure

Flag this post

This is a wild use case!

threadreaderapp.com·21h

🏗️LLM Infrastructure

Flag this post

Exploring RTEB, a New Benchmark To Evaluate Embedding Models

thenewstack.io·18h

🌏BGE Embeddings

Flag this post

Scaling Laws: How to Allocate Compute for Training Language Models

pub.towardsai.net·20m

📱Edge AI Optimization

Flag this post

Show HN: Autogenerate efficient backward kernels for Triton

github.com·2h·

Discuss: Hacker News

Flag this post

Nested Learning: The Illusion of Deep Learning Architectures

arxiviq.substack.com·19h·

Discuss: Substack

🧠LLM Inference

Flag this post

AI Black&Blonde for a 230% boost on inference speed

reddit.com·10h·

Discuss: r/LocalLLaMA

Flag this post

Lessons from the DeepChip Wars: What a Decade-old Debate Teaches Us About Tech Evolution

semiwiki.com·18h

Flag this post

GKE: From containers to agents, the unified platform for every modern workload

cloud.google.com·23m

🏗️LLM Infrastructure

Flag this post

Synth: The New Data Frontier

pleias.fr·6h·

Discuss: Hacker News

🏗️LLM Infrastructure

Flag this post

Show HN: Charl – ML language with native tensors and autograd

charlbase.org·22h·

Discuss: Hacker News

Flag this post

AI Memory: Enabling The Next Era Of High-Performance Computing

semiengineering.com·4h

Flag this post

Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem

arxiv.org·6h·

Discuss: Lobsters

Flag this post

Raycore: GPU accelerated and modular ray intersections

makie.org·21h·

Discuss: Hacker News

Flag this post

Lossless Compression with Asymmetric Numeral Systems (2020)

bjlkeng.io·23h·

Discuss: Hacker News

🗜️Vector Compression

Flag this post

Loading more...