bugrakadirhan's Feed

Writing GPU shaders in Rust (gpn24)

Rust-GPU compiles (embedded) Rust to GPU shaders. You can then use these shaders in the bevy game engine, your custom wgpu render or whatever else needs shaders. We'll look at how it compares to DSLs for GPU programming, such as glsl, wgsl or burn. And why are we targetting SPIR-V, yet can compile to wgsl and run on the web. And if we really can compile ordinary rust, what could we run on graphics cards? Licensed to the public under about this event: Read more ›

⚡ML Inference digitalocean.com·

Multi-Model API Cost Governance with the Inference Router

Learn how to use DigitalOcean’s Inference Router to govern multi-model API costs, route requests by task complexity, and reduce LLM inference spend. Read more ›

Covers GitHub - vllm-project/semantic-router: Intelligent Router for Mixture-of-Models

🤖Machine Learning medium.com

Understanding the Learning Rate: How Step Size Affects Neural Network Training

This is Day 11 of building a neural network from scratch. Yesterday we went over gradient descent: read the slope of the loss at your… Read more ›

🦀Rust medium.com

Axum: Building Rust Web Apps Has Never Been Easier

Learn Axum fundamentals to build Rust web apps without sacrificing development speed. Read more ›

🛠️ML Frameworks medium.com

Train Neural Networks without Draining your Pocket: Can PyTorch use XLA for GPUs?

Learn if PyTorch models can leverage XLA to boost model training on GPUs. Explore how XLA integration works in PyTorch for GPUs and TPUs. Read more ›

🔄MLOps TildAlice·

MLflow Quickstart 2026: Track Your First Experiment in 10 Minutes

Track ML experiments with MLflow in under 10 minutes — log params, metrics, and models in 3 lines of Python. Real benchmarks on sklearn and PyTorch. Read more ›

📐Model Architecture Ai2·

Which tokens does a hybrid model predict better?

New token-level analyses of Olmo 3 and Olmo Hybrid show that hybrid models predict meaning-bearing, context-dependent tokens better than transformers, while transformers retain an edge on verbatim copying. Read more ›

⚙️Model Training rodrigo-arenas.github.io·

Show HN: Sklearn-genetic-opt – evolutionary optimization for scikit-learn

Evolutionary hyperparameter tuning and feature selection for scikit-learn Read more ›

Discussed on Hacker News

🗜️Quantization arXiv·

An Empirical Study of OpenPangu Quantization on Ascend NPUs

OpenPangu models are attractive targets for private and domestic large-language-model deployment, yet their robustness under aggressive post-training quantization on Ascend NPUs has not been systematically characterized. This paper conducts a controlled empirical study of OpenPangu 1B and 7B models on Huawei Ascend 910B1 NPUs. We evaluate representative weight-only and weight-activation post-training quantization methods, including RTN, GPTQ, ... Read more ›

🖥️Systems ML AWS·

Optimize model training on Amazon SageMaker AI with NVIDIA Blackwell

This post shows you how to configure training jobs on Amazon SageMaker AI to get the most out of Blackwell’s architecture on AWS. You learn how to select batch sizes and sequence lengths that take advantage of Blackwell’s expanded memory, choose the right precision format for your model size (1B to 64B parameters), and apply activation checkpointing strategically. By the end, you have a practical framework for tuning your training configuration and launching distributed training jobs on P6-B2... Read more ›

Covers NVIDIA Blackwell Architecture

⚙️Systems Programming cascade-router.github.io·

Show HN: Cascade – A bare-metal C++ proxy that cuts LLM API bills by 70%

A bare-metal C++ AI proxy that predicts prompt complexity in 4.59 milliseconds and dynamically routes traffic to the most cost-effective LLM. Read more ›

Discussed on Hacker News

🕸️Neural Networks medium.com

HOW AI CREATE IMAGINARY PHOTOS?

One way is by pitting two convolutional neural networks (CNNs) against each other in a “contest” called a generative adversarial network… Read more ›

🧠Deep Learning foojay·

BoxLang 1.14.0 : Query Transformers – Take Full Control of Your Query Results

BoxLang 1.14.0 ships a lot of exciting features -- Dynamic Sets, Ranges, Inner Classes, JSONPath navigation -- but one quietly powerful addition will change the way you think about every database call in your application: Query Transformers, and this is ... The post appeared first on <a href=" Read more ›

🔧MLIR LLVM Weekly·

#651, June 22nd 2026

Welcome to the six hundred and fifty-first issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by Alex Bradbury. Subscribe to future issues at and pass it on to anyone else you think may be interested. Please send any tips or feedback via email: asb@asbradbury.org, or Mastodon: @llvmweekly@fosstodon.org / @asb@fosstodon.org, or Bluesky: @llvmweekly.org / @asbradbury.org. Read more ›

🎮GPU Programming easternherald.com·

Qualcomm Acquires Modular for $3.9 Billion to Challenge Nvidia’s CUDA Software Lock-In

Qualcomm is acquiring Modular, the AI software startup behind the Mojo programming language and MAX inference engine, for $3.9 billion in stock. The deal gives Qualcomm the software layer to run AI on its own chips, challenging Nvidia's CUDA ecosystem that has locked in developers for 17 years. Read more ›

🔗Distributed Training linaro.org·

Introducing Linaro 26.0

Linaro Forge 26.0 introduces NCCL collective profiling in MAP and Performance Reports, giving full visibility into GPU-to-GPU communication at scale. We put it to the test on a multi-node cluster, read this blog and see what we found, with zero code changes required. Read more ›

⚡ML Inference GitHub·

Show HN: mlx-chronos - benchmark MLX inference engines on Apple Silicon

Community-driven benchmark suite for MLX inference engines on Apple Silicon - igurss/mlx-chronos Read more ›

Discussed on Hacker News

📐Model Architecture arXiv·

Full-resolution MLPs Empower Medical Dense Prediction

arXiv:2311.16707v2 Announce Type: replace-cross Abstract: Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to... Read more ›

🦀Rust blog.rust-lang.org·

The many journeys of learning Rust

Empowering everyone to build reliable and efficient software. Read more ›

🕸️Neural Networks medium.com

Deep Learning (Part-02): Basics of Deep Learning & Neural Networks

Understanding Neurons, Neural Networks, Neural Connections, Activation Functions & More Read more ›