📊 Model Evaluation - tarokuriyama · Scour

Testing 80 LLMs on spatial reasoning on grids

mihai.page·1d·

Discuss: Hacker News

Show HN: Multi-attribute decision frameworks for tech purchases

news.ycombinator.com·2h·

Discuss: Hacker News

AI is a High Pass Filter for Software Delivery

bryanfinster.substack.com·8h·

Discuss: Substack

Statistical-Based Metric Threshold Setting Method for Software Fault Prediction in Firmware Projects: An Industrial Experience

arxiv.org·1d

🔧Functional Programming

Guide: Getting started with choosing a Machine Learning CLIP Model for Smart Search · immich-app/immich

github.com·7h

Testing software in the era of coding agents

garymm.org·16h·

Discuss: Hacker News

AI & ML Mobile App Development Services | Smarter Apps for Every Industry

cizotech.com·1h·

Discuss: DEV

SAE Feature Matchmaking (Layer-to-Layer) by Mitali M

greaterwrong.com·3h

Securing GenAI: Vol 5 — Model deployment and change management

pub.towardsai.net·3h

Impressions on the Book “Tidy First? A Personal Exercise in Empirical Software Design” by Kent Beck

dev.to·6h·

Discuss: DEV

🔧Functional Programming

Vibe Coding for Scientists

vibe-coding-101-iota.vercel.app·3h

Custom AI Tool Development in Regulated Industries: Why Off-The-Shelf LLM Solutions Fall Short

analyticsvidhya.com·19h

Groups of diverse problem solvers can outperform groups of high-ability problem solvers

pnas.org·6h

🔧Functional Programming

keygen.sh·1d

Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship

rilldata.com·15h·

Discuss: Hacker News

Study: Platforms that rank the latest LLMs can be unreliable

news.mit.edu·1d

Manufacturing QMS Software

samrian.com·15h·

Discuss: Hacker News

Performance Tip of the Week #94: Decision making in a data-imperfect world

abseil.io·2d

Measuring Model Overconfidence: When AI Thinks It Knows

dev.to·2d·

Discuss: DEV

AI Coding Is a Framework—Use It Like a Library

piglei.com·1h·

Discuss: Hacker News, r/programming

Loading more...