MaskedFOP: Polyglot Speaker Identification under Missing Visual Modality via Cascaded Graph Label Propagation (opens in new tab)

We present MaskedFOP, a system for closed-set polyglot speaker identification under two simultaneous challenges: the face modality is entirely absent at test time, and speech comes from Urdu, a language unseen during face-supervised training. The system integrates three complementary mechanisms. First, a modality-dropout dual-head network built on the Fusion and Orthogonal Projection (FOP) backbone forces the audio branch to develop independent ...

Read the original article