Your Guide to LLM Evaluation tools
pub.towardsai.netยท10h
๐Ÿ—๏ธLLM Infrastructure
Flag this post
AI Model Growth Outpaces Hardware Improvements
spectrum.ieee.orgยท14h
๐Ÿ“ANN Benchmarks
Flag this post
The Backbone Breaker Benchmark: Testing the Real Security of AI Agents
lakera.aiยท11hยท
Discuss: Hacker News
๐Ÿ›ก๏ธAI Security
Flag this post
The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims
arxiv.orgยท23h
๐Ÿ›ก๏ธAI Security
Flag this post
Thought Engineering
pranavc28.github.ioยท10mยท
Discuss: Hacker News
๐Ÿ—๏ธLLM Infrastructure
Flag this post
Advances In Formal Verification Technology
semiengineering.comยท20h
๐ŸงฎSMT Solvers
Flag this post
AI #140: Trying To Hold The Line
lesswrong.comยท9h
๐Ÿค–AI
Flag this post
Raising the Bar on ML Model Deployment Safety
uber.comยท13h
๐Ÿ—๏ธLLM Infrastructure
Flag this post
Context engineering
chrisloy.devยท17hยท
Discuss: Hacker News
๐Ÿช„Prompt Engineering
Flag this post
How Well Does RL Scale?
tobyord.comยท9hยท
Discuss: Hacker News
๐Ÿ—๏ธLLM Infrastructure
Flag this post
Any advice on what I should be doing?
reddit.comยท16hยท
Discuss: r/LocalLLaMA
๐Ÿ—๏ธLLM Infrastructure
Flag this post
What Is an AI PaaS? A Guide to the Future of AI Development
thenewstack.ioยท7h
๐Ÿ—๏ธLLM Infrastructure
Flag this post
Enabling Publishers to Express Preferences for AI Crawlers: An Update on the AIPREF Working Group
ietf.orgยท17h
๐ŸŒWeb Standards
Flag this post
From Generative to Agentic AI
databricks.comยท11h
๐Ÿ†•New AI
Flag this post
Testing Cross-Lingual Text Comprehension In LLMs Using Next Sentence Prediction
arxiv.orgยท23h
๐ŸŒBGE Embeddings
Flag this post
Opportunistically Parallel Lambda Calculus
dl.acm.orgยท5hยท
Discuss: Hacker News
๐Ÿ’ปProgramming languages
Flag this post
New trend: programming by kicking off parallel AI agents
blog.pragmaticengineer.comยท9h
๐Ÿ†•New AI
Flag this post
Mistake-filled legal briefs show the limits of relying on AI tools at work
techxplore.comยท9h
๐Ÿ›ก๏ธAI Safety
Flag this post