gautam6599123's Feed

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

GPT-2-style LLM built from scratch in C/CUDA with hand-written backprop, BPE tokenizer, FlashAttention, pretraining, and SFT. - JustVugg/nanoeuler Read more ›

Discussed on Hacker News

🗣️Large Language Models tokoscope.com·

Automatic LLM token compression and cost monitoring in 2 lines

Audit, compress and monitor your LLM token usage. Most teams cut API costs 40-60% with two lines of code. Read more ›

Discussed on Hacker News

🔢Numerical Methods arxiv.org·

fOGA: An Orthogonal Greedy Algorithm for Fractional Laplacian Problems

In this paper, we propose a numerical method for fractional Laplace equations that combines finite difference discretization with shallow neural network approximation. The fractional Laplace operator is discretized using a directional representation of Riemann--Liouville type, which leads to a finite difference approximation of the nonlocal operator. In two dimensions, the angular integral is approximated by a quadrature rule, and auxiliary ... Read more ›

🤖Transformers astledsa.substack.com·

Tree Transformers

A step towards generalizing the transformer architecture Read more ›

Discussed on Substack

🔢Mathematics Hacker News·

Ask HN: Will we start seeing tools for LLM use?

Many projects and addons exists to reduce the verbosity on standard bash / git /npm etc. Commands that agents pass regularly as tools to LLMs. (eg. rtk, headroom, lean-ctx). Tool output compression does yield good token savings. Though it can lead to increased turns - effectively nullifying the per turn token savings. This is the current topology. Are we going to see a class of products and libraries that structure o/p to what models want to see? Read more ›

Discussed on Hacker News

🔢Combinatorics arxiv.org·

The Genesis Sequence, Tree Records and Endofunctions

We present bijections connecting tree records, the girth of a connected endofunction, and the genesis sequence (the first sequence in OEIS). Using these, we derive generating functions for tree and forest record numbers in terms of Cayley's tree function and give a new proof of Cayley's forest formula. Read more ›

📡Information Theory lesswrong.com·

Two Classical Answers to "What do Two Variables Share?"

First post in a planned cluster on exact results for natural latents. Here, I connect some established results in classical information theory to natural latents.Suppose Alice observes mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; }... Read more ›

📊Data Science GitHub·

Show HN: Dbgsom: A scikit-learn compatible Self-organizing Map

A scikit-learn compatible Python implementation of the Directed Batch Growing Self-Organizing Map - SandroMartens/DBGSOM Read more ›

Discussed on Hacker News

🧠Deep Learning scalingintelligence.stanford.edu·

Toward Better Hip Kernel Generation for AMD GPUs

--- ## TLDR In this work, we explore how to make language models better at generating high performance **HIP kernels** for **AMD GPUs**\. We present the following: 1. A **synthetic dataset** of 500 new PyTorch reference tasks using **mutation**, **composition**, and **constraint-based generation** to cover a broader range of workloads\. 1. A **multi-agent optimization pipeline** for HIP kernel generation\. Instead of relying on single-shot prompting, we used specialized agents for tas... Read more ›

Covers KernelBench: Can LLMs Write Efficient GPU Kernels?

Discussed on Hacker News

⏱️Time Series Analysis KDnuggets·

Building Time-Series Machine Learning Models with Sktime in Python

In this article, we’ll build time-series machine learning models in Python using sktime and explore its core data structures for forecasting workflows. Read more ›

Discussed on Hacker News

🔢Discrete Mathematics arxiv.org·

k-Convex Polyominoes by Semi-perimeter

We give the conjectured solution for the generating function of k-convex polyominoes, enumerated by semi-perimeter. The solution was obtained from the analysis of enumeration data that we generated. Read more ›

🧠Neural Networks Nature·

Deep learning reveals antimicrobial peptides within prions

Prion and prion-like proteins are classically associated with protein misfolding, but amyloidogenic sequences can also participate in host defence. Here, using deep learning, we screened 19.3 million fragments from 2,897 curated prion-related proteins and identified 1,179 candidate antimicrobial peptides, which we term prionins. Among 75 synthesized prionins, 59 inhibited bacterial pathogens, 53 perturbed membranes and 2 reduced Acinetobacter baumannii infection burden in mice. Deep learning ... Read more ›

Covered by Phys.org

Discussed on Hacker News

🔷Abstract Algebra arxiv.org·

The algebra of Krom logic programs

This paper investigates the algebraic structure of Krom logic programs, consisting only of facts and rules with at most one body atom. We show that sequential composition endows the class of Krom programs with a natural monoid structure and that this structure admits rich algebraic extensions to Krom seminearrings, Krom quemirings, Krom-Conway seminearrings, and Krom-Conway omegaseminearrings. Furthermore, we establish explicit generating sets a... Read more ›

🐍Python GitHub·

JEP: Embed Python in Java, the Polished Way

Embed Python in Java. Contribute to ninia/jep development by creating an account on GitHub. Read more ›

Discussed on Hacker News

🌐Distributed Systems theconsensus.dev·

Pierre Zemb from Clever Cloud

Pierre Zemb is a staff engineer at Clever Cloud where he's building data layers API-compatible with services like Redis, PostgreSQL, and etcd on top of FoundationDB. Read more ›

Discussed on Hacker News

🤖AI mstar.stanford.edu·

M* (M-Star): A Modular, Extensible, Serving System for Multimodal Models

Composite models broke the single-loop assumption behind LLM serving. The Walk Graph fixes it. Read more ›

Discussed on Hacker News

💻Computer Science en.algorithmica.org·

Complexity Models

If you ever opened a computer science textbook, it probably introduced computational complexity somewhere in the very beginning. Simply put, it is the total count of elementary operations (additions, multiplications, reads, writes…) that are executed during a computation, optionally weighted by their costs. Read more ›

Discussed on Hacker News

🗣️Large Language Models ourtoken.ai·

You're probably paying GPT prices for tasks that don't need GPT

Use one unified API to access OpenAI, Claude, GLM, MiniMax and other LLMs. Compare models, prices, and capabilities to find the best fit for your prompts. Read more ›

Discussed on Hacker News

💬Natural Language Processing The headless browser·

Introducing Lightpanda Agent and PandaScript: LLM at buildtime not runtime

Lightpanda now has a built-in agent. Talk to it in natural language, get a reliable script back. Run that script forever only calling an LLM when it needs fixing. Read more ›

Discussed on Hacker News

📈Statistical Learning arxiv.org·

A General Framework for Decision Trees via Bregman Divergences

Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman d... Read more ›