📉 Model Quantization - miterion · Scour

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arxiv.org·18h

🏎️TensorRT

OpenAI turns model compression into a talent hunt with its 16 MB "Parameter Golf" challenge

the-decoder.com

·4h

🏎️TensorRT

Post Training Quantization for Efficient Dataset Condensation

arxiv.org·1d

🏎️TensorRT

Choosing the Right AI Model: Cost, Performance & Trade-offs

peggie7191.medium.com·12h

🎓Model Distillation

From Exact kNN to DiskANN: The Evolution of High-Performance Vector Search

hackernoon.com·16h

⚡ONNX Runtime

Divetoxx/Mandelbrot: True 24-bit BGR TrueColor. High-Precision Rendering (80-bit). Multi-threaded performance (OpenMP). True SSAA 8x8 (64 independent samples per pixel) direct RGB-space integration. G, B, R - The Red, Green, and Blue channels are calculated using sine and cosine waves

github.com·1d·

Discuss: r/programming

From Reactive to Predictive: AI-Driven Optimization for ATE Performance & Reliability

semiengineering.com·15h

⏱️CUDA Events

Phonological complexity, speech style, and individual differences influence ASR performance for Tarifit

nature.com·1d

📊Gradient Accumulation

`quantized_matmul` performance degrades significantly with `group_size=32` vs `group_size=128` · Issue #3251

github.com·2d·

Discuss: r/LocalLLaMA

50x Faster Post-Training

workshoplabs.ai·5d·

Discuss: Hacker News, r/LocalLLaMA

🏎️TensorRT

Quantization Explained: Q4_K_M vs AWQ vs FP16 for Local LLMs

sitepoint.com·5d

🎯Tensor Cores

Less-relevant results

Analyzing the Performance of the K-Nearest Neighbors (KNN) Algorithm with Different Values of k

medium.com

·4d

🔗Kernel Fusion

How post-training shapes legal representations: probing SCOTUS opinions across model families

lesswrong.com·3d

Show HN: We Built Private Post-Training and Inference for Frontier Models

workshoplabs.ai·2d·

Discuss: Hacker News

⚡ONNX Runtime

MolmoPoint: Better pointing architecture for vision-language models

allenai.org·7h

👁️Attention Optimization

AI on HPC Workshop 2026

ai-on-hpc.github.io·5h

⚡ONNX Runtime

Fine-Tuning Phi-3 & Gemma 2: The Budget Path to GPT-4 Performance at a Fraction of the Cost

dev.to·5d·

Discuss: DEV

⚡ONNX Runtime

Explainable artificial intelligence for early Alzheimer’s diagnosis using enhanced grey relational features and multimodal data

nature.com·1d

👁️Attention Optimization

How we optimized Dash's relevance judge with DSPy

dropbox.tech·18h·

Discuss: Hacker News

👁️Attention Optimization

The Performance and Architecture of LeanStack AI

builder.aws.com·6d·

Discuss: DEV

⚡ONNX Runtime

Loading more...