bugrakadirhan's Feed

Reading AI Model Compilation in MLIR Through the Lens of Formal Theories

Compiler infrastructures such as MLIR rest on a set of design principles: IR abstractions, interfaces, match-and-rewrite, flow analysis, type conversion, staged lowering, and so on. These concepts have proven themselves in practice. Good designs typically arrive through engineering knowledge, intuition and experience. Many of them, however, have correspondences in formal theory. MLIR's match-and-rewrite engine has correspondence to a \emph{term-... Read more ›

🛠️ML Frameworks idlemachines.co.uk·

The annotated PyTorch training loop

LeetCode for Machine Learning. Practice ML coding problems with a real Python execution environment. Read more ›

Discussed on Hacker News

🧠Deep Learning astledsa.substack.com·

Tree Transformers

A step towards generalizing the transformer architecture Read more ›

Discussed on Substack

🗜️Quantization David Noel Ng·

2x GH200 for LLM inference, Part 3: GLM-5.2, expert offload, and the CPU question

Introduction Part 1 measured the dual GH200 workstation as a memory system. Part 2 used those measurements to explain why DeepSeek V4 Flash can be fast in vLLM when the model layout fits the hardware: keep hot weights in HBM, avoid unnecessary Hopper-to-Hopper traffic, and use MTP only where the acceptance rate pays for the draft work. GLM-5.2 starts at 2.39 output tok/s on this machine and a... Read more ›

⚡ML Inference Phoronix·

AMD Contributes ONNX Runtime Backend To FFmpeg DNN Filter

An AMD engineer has contributed to the upstream FFmpeg library an ONNX Runtime back-end for its DNN filter. The FFmpeg Deep Neural Network (DNN) filters allow for running AI models natively inside the video processing pipeline for upscaling, object detection, background segmentation, and more. This ONNX Runntime back-end support is notable in that it expands the GPU and NPU capabilities with FFmpeg... Read more ›

⚙️Model Training arXiv·

Statistically Valid Hyperparameter Selection: From Tuning to Guarantees

Hyperparameter selection is a critical step in the deployment of modern artificial intelligence systems, given the need to tune degrees of freedom such as inference-time parameters, implementation-level settings, and thresholds driving decision rules. Despite its practical importance, hyperparameter selection is typically performed using best-effort empirical methods such as grid search or Bayesian optimization, which provide no formal statistic... Read more ›

🦀WGPU ludion.ai·

WebGPU feature detection was not enough to run small LLMs on phones

Four browser environments that exposed WebGPU, and what the measurements say about whether a small LLM run completes. Read more ›

Discussed on Hacker News

🦀Rust GitHub·

v0.8.19

A patch release, mostly of bugfixes. Note: one of these includes a behavior change, which is that the primary server function encodings now respect the Axum/Actix request body size limits, rather t... Read more ›

🔄MLOps ostif.org·

Kubeflow Audit Complete

The Open Source Technology Improvement Fund is proud to share the results of our security audit of Kubeflow. Kubeflow functions for building and deploying customizable machine learning workflows in Kubernetes, and has many subprojects able to be implemented individually or in combination. Thanks to ADA Logics and the Cloud Native Computing Foundation, Kubeflow underwent a custom security engagement that audited 6 projects in the Kubeflow ecosystem. Read more ›

Covers Cloud Native Computing Foundation

Discussed on Hacker News

🖥️Systems ML medium.com

Train Neural Networks without Draining your Pocket: Distribution Strategy Concepts in TensorFlow

Learn about Distributed Training in TensorFlow. Explore the basics of parallel computing and distributed strategies for training… Read more ›

🧠Deep Learning arXiv·

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoP... Read more ›

Covered by 13 sources including pathtostaff.com, DEV Community

⚙️Systems Programming Cybersecurity and Infrastructure Security Agency CISA·

Impact of Linux Kernel vulnerabilities on B&R products

Impact of Linux Kernel vulnerabilities on B&R products apeterson Jun 23, 2026 Release DateJune 23, 2026 DescriptionSummaryB&R is aware of publicly reported vulnerabilities affecting the Linux kernel versions shipped with the products listed as affected in the advisory. Successful local exploitation of these vulnerabilities could allow an attacker to escalate privileges on the affected system. Public proof-of-concept exploits are available for the vulnerabilities described herein. At the time ... Read more ›

📐Model Architecture arXiv·

Performance and Interpretability of Convolutional, Transformer, and Hybrid Deep Learning Models in Colorectal Histology Classification

Deep learning has become an important tool in computational pathology, enabling automated analysis of histopathological images. While convolutional neural networks (CNNs) have traditionally dominated this field, transformer-based and hybrid architectures have recently demonstrated promising performance. However, comprehensive comparisons of these approaches for colorectal histopathology remain limited. This study evaluated twelve ImageNet-pretra... Read more ›

🕸️Neural Networks arXiv·

Shifting-based Optimizable Linear Relaxations for General Activation Functions

The use of neural networks (NNs) is rapidly increasing, including in safety- and security-critical domains. To provide formal guarantees about NN behavior, many verification methods rely on optimizable linear relaxations of activation functions. However, existing techniques depend on hand-crafted relaxations for each activation function. Extension to state-of-the-art activation functions therefore requires substantial manual effort. In contrast,... Read more ›

🎮GPU Programming storagereview.com·

Dell PowerEdge XE8812 Brings NVIDIA Vera Rubin NVL4 to HPC, Up to 144 GPUs Per Rack

Dell Technologies has introduced the PowerEdge XE8812, a new liquid-cooled server platform designed for large-scale inference and high-performance computing workloads. The system joins the Dell AI Factory with the NVIDIA portfolio. It is built around the NVIDIA Vera Rubin NVL4 architecture, offering up to 144 GPUs per rack in a dense rack-scale configuration. The announcement The post appeared first on <a href=" Read more ›

🤖Machine Learning medium.com

Deep Learning Inference: PyTorch, ONNX, and TensorRT Explained

If you are learning Machine Learning, you have probably lived this exact scenario: You spend hours cleaning a dataset, you build a PyTorch… Read more ›

🛠️ML Frameworks medium.com

The Pragmatic Shift: How to Actually Become an AI Engineer in 2026

If you told someone you were an “AI Engineer” a few years ago, they probably assumed you were elbow-deep in PyTorch, wrangling massive… Read more ›

🔄MLOps medium.com

RocoMart: Building an End-to-End MLOps Pipeline Orchestration for E-Commerce

Architect a robust MLOps pipeline from scratch using Python, Prefect, MLflow, and Flask to power real-time e-commerce tech. Read more ›

⚡ML Inference Modular Blog·

Modular: Inference from Kernel to Cloud

The unified AI inference stack - from custom GPU kernels to production cloud serving on NVIDIA and AMD. 2x performance. Top open models. Open source stack. Read more ›

Covered by 5 sources including GitHub, Tech Funding News

⚙️Model Training arXiv·

Towards Robust Training in NNGPT AutoML Pipeline: A Loss-Optimizer Pairing Selection Study

The choice of loss function and optimizer is an important decision, that shapes further model training. Yet automated architecture search pipelines (AutoML) benefits significantly more from the optimal pairing selection and vice versa. This paper investigates whether a single recipe is sufficient for heterogeneous architecture pools, or whether the optimal pairing varies across structurally diverse models. We conduct a systematic empirical study... Read more ›