⚡ Quantization - Erdwig · Scour

youyeetoo updates R1 SBC and lists K1 N100-based x86 computer

⚙️Zstandard

linuxgizmos.com·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🎲Probabilistic Inference Code

github.com··Hacker News, r/LLM

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

🗣️Natural Language Parsing Academic

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🏷️Named Entity Recognition

local-llm.utop.workers.dev··Hacker News

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

🎲Probabilistic Inference Academic

Where to Host Your Open-Source Model (Under 10B Parameters)

🗂️Hash Tables

digitalocean.com·

[AINews] not much happened today

🏷️Named Entity Recognition News

·

Less-relevant results

Day 8 of #100DaysOfClickHouse: Understanding ClickHouse® Data Types

🗂️Columnar Storage

quantrail-data.com··DEV

Nvidia's RTX Spark is a developer's dream, but AMD's Ryzen AI Max+ is what most people actually need for local AI

🗂️Columnar Storage

xda-developers.com·

CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster

🗜️Compression Algorithms Blog Discussion

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

🗂️Hash Tables Code

github.com··r/LocalLLaMA

Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W

🗜️Compression Algorithms

armdevices.net·

Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery

🎲Probabilistic Inference Academic

AMD's Frank Azor pushes back against claim that FSR 4.1 won't be ported to RDNA 3.5 GPUs — says 'no such decision' has been made

tomshardware.com

·

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🌲Binary Search Trees Academic

arxiv.org··Hacker News

Does anyone know what PCIe mode was used for these benchmarks?

🗂️Hash Tables Code

github.com··r/LocalLLaMA

Nvidia RTX Spark: The $2,900 Floor Tells You Everything

🗂️Columnar Storage Blog Discussion

Thundercomm TurboX C7790 Android and Linux development kit features Qualcomm Dragonwing Q-7790 Edge AI SoC - CNX Software

🗜️Compression Algorithms News

cnx-software.com·

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

🗜️Compression Algorithms Academic

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🗂️Hash Tables Code

github.com··Hacker News

Log in to enable infinite scrolling