LLM Inference

Feeds to Scour
SubscribedAll
Scoured 187 posts in 8.1 ms

How we fight GPU scarcity without compromise

 ⚙️AI Infrastructure  Content type: Blog
equixly.com··Hacker News
Less-relevant results

Token4Token — pay-per-token inference on Gnosis + Swarm

 ⚙️AI Infrastructure

Making LLMs faster and more efficient across multiple languages

 👁️Multimodal LLMs
techxplore.com·

Build a local voice agent with Red Hat OpenShift AI

 ⚙️AI Infrastructure
developers.redhat.com·

Making Local LLM Go Brrr

 ⚙️AI Infrastructure

Breaking the Ice: Analyzing Cold Start Latency in vLLM

 ⚙️AI Infrastructure  Content type: Academic
arxiv.org··Hacker News

CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster

 Inference Optimization  Content type: Blog  Content type: Discussion
tildalice.io·

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

 Inference Optimization  Content type: Code
github.com··Hacker News

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

 Inference Optimization  Content type: Blog
databricks.com·

OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine

 👁️Multimodal LLMs
linuxiac.com·

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

 👁️Multimodal LLMs  Content type: News
cnx-software.com·

A field journal on Ray Data and Daft for multimodal data lake (14 minute read)

 👁️Multimodal LLMs  Content type: Blog
mehulbatra.medium.com·

Intro — Sehastrajit

 👁️Multimodal LLMs  Content type: Blog
medium.com·

Where to Host Your Open-Source Model (Under 10B Parameters)

 ⚙️AI Infrastructure
digitalocean.com·

not much happened today | AINews

 ⚙️AI Infrastructure
news.smol.ai·

The 1-Second Timeout Hack: Running Infinite Parallel Workloads Natively on Google Apps Script

 ⚙️AI Infrastructure  Content type: Blog
medium.com
·

The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure

 🔍Retrieval-Augmented Generation
devops.com·

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

 ⚙️AI Infrastructure  Content type: Academic
arxiv.org·

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

 ⚙️AI Infrastructure  Content type: News
latent.space
·

Ask HN: Is software engineering still a good career choice for new students?

 Inference Optimization  Content type: Discussion
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help