arXiv:2602.00412v1 Announce Type: cross Abstract: The goal of automated machine learning (AutoML) is to reduce trial and error when doing machine learning (ML). Although AutoML methods for classification are able to deal with data imperfections, such as outliers, multiple scales and missing data, their behavior is less known on dirty categorical datasets. These datasets often have several categorical features with high cardinality arising from issues such as lack of curation and automated collection. Recent research has shown that ML models can benefit from morphological encoders for dirty categorical data, leading to significantly superior predictive performance. However the effects of using such encoders in AutoML methods are not known at the moment. In this paper, we propose a pipeline…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help