Guest post: Balancing your dataset? Mind the privacy leaks!
desfontain.es·14h·
Discuss: Hacker News
🛡️Privacy Engineering
Preview
Report Post

This blog post is written by Georgi Ganev; I helped with editing and am delighted to host it here as a guest post. If you’d also like to contribute a post about your research to this blog, don’t hesitate to get in touch!


Imagine that you’re building a machine learning classifier detecting a rare disease or financial fraud. Most of your data comes from healthy patients or legitimate transactions, while positive cases (i.e., the presence of disease or fraud) are few and far between. When you evaluate your model, you notice something worrying: performance is much worse on these rare cases. The model struggles precisely where mistakes are most costly. This is a common problem in real-world datasets, where some classes (or combinations of featu…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help