Show HN: CodeLens.AI– Community benchmark comparing 6 LLMs on real code tasks
codelens.ai·2h·
Discuss: Hacker News
⚙️Compression Benchmarking
How Much Should You Tell Your AI Agent?
raymondyxu.com·2d·
Discuss: Hacker News
Proof Automation
Atomic and Saturated Models
functor.network·4d·
Discuss: Hacker News
🔢Denotational Semantics
MaNGO - Adaptable Graph Network Simulators via Meta-Learning
arxiv.org·12h
🕸️Tensor Networks
Large Language Models Achieve Gold Medal Performance at International Astronomy & Astrophysics Olympiad
arxiv.org·1d
🚀SIMD Text Processing
LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability
arxiv.org·12h
📊Quantization
SigmaEval – statistical evaluation for GenAI apps
github.com·3h·
Discuss: Hacker News
🔍Concolic Testing
Curriculum-Augmented GFlowNets For mRNA Sequence Generation
arxiv.org·1d
🔨Compilers
The fragility of "cultural tendencies" in LLMs
arxiv.org·12h
🧮Theoretical Computer Science
A tiny recursive reasoning model achieves 45% on ARC-AGI-1 and 8% on ARC-AGI-2
alexiajm.github.io·20h·
🧠Intelligence Compression
Python 3.14 Released with Template String Literals, Deferred Annotations, and
socket.dev·18h·
Discuss: Hacker News
💧Liquid Types
On The Fragility of Benchmark Contamination Detection in Reasoning Models
arxiv.org·2d
🧪Hardware Fuzzing
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning
arxiv.org·1d
💻Local LLMs
How to Train an LLM to Do Proofs: Beyond Verifiable Rewards
tobysimonds.com·3d·
Discuss: Hacker News
🎯Interactive Provers
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
arxiv.org·2d
🎙️Whisper
Cactus Language • Semantics 1
inquiryintoinquiry.com·2d
🔢Denotational Semantics
You Don't Know RAG. You Know Simple RAG.
dev.to·1d·
Discuss: DEV
🌀Brotli Internals
Improving Metacognition and Uncertainty Communication in Language Models
arxiv.org·12h
🧠Intelligence Compression
Precision Irrigation Optimization via Reinforcement Learning and Real-Time Soil Moisture Mapping
dev.to·2h·
Discuss: DEV
Precision Brewing
Automating construction safety inspections using a multi-modal vision-language RAG framework
arxiv.org·1d
🤖Advanced OCR