9 min read6 days ago
–
Most model debates start with accuracy, AUC, or F1. However, real systems do not run on metrics. They run on money, risk, and capacity.
Press enter or click to view image in full size
In fraud alerts, credit approvals, and claims, your model is usually not making the final decision. It is producing a score. Then you choose a threshold (or a top-K policy) to decide what to review, block, approve, or pay.
Although a model can look “better” on AUC, it can still lose money if the threshold policy ignores expected value, costs, and team capacity. Therefore, this article shows a practical way to pick thresholds using profit, not accuracy.
The Core Idea: Expected Value Beats Accuracy
For each case, you have outcomes with money:
- **…
9 min read6 days ago
–
Most model debates start with accuracy, AUC, or F1. However, real systems do not run on metrics. They run on money, risk, and capacity.
Press enter or click to view image in full size
In fraud alerts, credit approvals, and claims, your model is usually not making the final decision. It is producing a score. Then you choose a threshold (or a top-K policy) to decide what to review, block, approve, or pay.
Although a model can look “better” on AUC, it can still lose money if the threshold policy ignores expected value, costs, and team capacity. Therefore, this article shows a practical way to pick thresholds using profit, not accuracy.
The Core Idea: Expected Value Beats Accuracy
For each case, you have outcomes with money:
- True Positive (TP): you correctly catch fraud → you save money, but you pay review cost
- False Positive (FP): you flag a clean case → you pay review cost and hurt customer experience
- False Negative (FN): you miss fraud → you lose money
- True Negative (TN): nothing happens
As a result, the “best threshold” is the one that maximizes:
Expected Profit = Benefit(TP) − Cost(FP) − Cost(Reviews) −…