Quantization

Feeds to Scour
SubscribedAll
Scoured 46 posts in 12.4 ms

youyeetoo updates R1 SBC and lists K1 N100-based x86 computer

 ⚙️Zstandard
linuxgizmos.com·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🎲Probabilistic Inference  Content type: Code
github.com··Hacker News, r/LLM

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

 🗣️Natural Language Parsing  Content type: Academic
arxiv.org·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🏷️Named Entity Recognition

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

 🎲Probabilistic Inference  Content type: Academic
arxiv.org·

Where to Host Your Open-Source Model (Under 10B Parameters)

 🗂️Hash Tables
digitalocean.com·

[AINews] not much happened today

 🏷️Named Entity Recognition  Content type: News
latent.space
·
Less-relevant results

Day 8 of #100DaysOfClickHouse: Understanding ClickHouse® Data Types

 🗂️Columnar Storage
quantrail-data.com··DEV

Nvidia's RTX Spark is a developer's dream, but AMD's Ryzen AI Max+ is what most people actually need for local AI

 🗂️Columnar Storage
xda-developers.com·

CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster

 🗜️Compression Algorithms  Content type: Blog  Content type: Discussion
tildalice.io·

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

 🗂️Hash Tables  Content type: Code
github.com··r/LocalLLaMA

Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W

 🗜️Compression Algorithms
armdevices.net·

Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

 🎲Probabilistic Inference  Content type: Academic
arxiv.org·

AMD's Frank Azor pushes back against claim that FSR 4.1 won't be ported to RDNA 3.5 GPUs — says 'no such decision' has been made

 🔀CRDTs
tomshardware.com
·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

 🌲Binary Search Trees  Content type: Academic
arxiv.org··Hacker News

Does anyone know what PCIe mode was used for these benchmarks?

 🗂️Hash Tables  Content type: Code
github.com··r/LocalLLaMA

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

 🗂️Columnar Storage  Content type: Blog  Content type: Discussion
tildalice.io·

Thundercomm TurboX C7790 Android and Linux development kit features Qualcomm Dragonwing Q-7790 Edge AI SoC - CNX Software

 🗜️Compression Algorithms  Content type: News
cnx-software.com·

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

 🗜️Compression Algorithms  Content type: Academic
arxiv.org·

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 🗂️Hash Tables  Content type: Code
github.com··Hacker News

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help