Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚖️ A/B Testing
Experimentation, Statistical Significance, Model Comparison
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
81219
posts in
490.8
ms
Stress-Testing Alignment
Audits
With Prompt-Level Strategic
Deception
arxiv.org
·
16h
📈
Model Evaluation
System-Level Error
Propagation
and Tail-Risk
Amplification
in Reference-Based Robotic Navigation
arxiv.org
·
16h
👁️
Computer Vision
Why AI Agents Make
Different
Decisions
When They Think It's Real
dev.to
·
2d
·
Discuss:
DEV
📈
Model Evaluation
When
Policies
Collide
dev.to
·
1d
·
Discuss:
DEV
⛓️
LangChain
Testing can be
fun
, actually
giacomocavalieri.me
·
4d
·
Discuss:
Lobsters
,
Hacker News
🚀
MLOps
p-values
are good actually
lesswrong.com
·
5d
📈
Model Evaluation
OpenAI and
Ginkgo
Bioworks
build an autonomous lab where GPT-5 calls the shots
the-decoder.com
·
4d
📝
Natural Language Processing
Pydantic
Performance: 4 Tips on How to Validate Large
Amounts
of Data Efficiently
towardsdatascience.com
·
4d
📈
Model Evaluation
Microsoft unveils method to detect
sleeper
agent
backdoors
artificialintelligence-news.com
·
5d
🧠
Machine Learning
Performance Tip of the Week #88: Measurement methodology: Avoid the
jelly
beans
trap
abseil.io
·
2d
📈
Model Evaluation
A
Horrible
Conclusion
addisoncrump.info
·
3d
·
Discuss:
Lobsters
,
Hacker News
🚀
MLOps
*Early‑
Relapse
Prediction and Adaptive Intervention Scheduling for Major
Depressive
Disorder Using Continuous EEG and Reinforcement‑Learning‑Based Digital Therapeutics*
freederia.com
·
4d
🧠
Machine Learning
Sidestepping
Evaluation Awareness and
Anticipating
Misalignment with Production Evaluations
alignment.openai.com
·
5d
·
Discuss:
Hacker News
📈
Model Evaluation
LLM Inference
Benchmarking
-
Measure
What Matters
digitalocean.com
·
4d
📈
Model Evaluation
Narrative-Driven Development:
BDD
+
TDD
+ Living Documentation in One Workflow
test2doc.com
·
4d
·
Discuss:
Hacker News
📈
Model Evaluation
StatLLM
: A Dataset for Evaluating the Performance of Large Language Models in
Statistical
Analysis
nature.com
·
4d
⚙️
Model Fine-tuning
Quash
: A mobile
QA
agent that runs tests without scripts
producthunt.com
·
4d
📈
Model Evaluation
A Large-Scale
Peripheral
Blood Cell Dataset for Automated
Hematological
Analysis
nature.com
·
4d
🧠
Machine Learning
Build Better Strategies, Part 6:
Evaluation
[Financial
Hacker
]
financial-hacker.com
·
4d
📈
Model Evaluation
Performance
Tip
of the Week #75: How to
microbenchmark
abseil.io
·
2d
📈
Model Evaluation
Loading...
Loading more...
« Page 5
•
Page 7 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help