word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
paperium.net·2d·
Discuss: DEV
🔢Embeddings
Preview
Report Post

Clarifying Goals and Conceptual Framing

What the method tries to capture

At first glance the work aims to turn co-occurrence patterns into compact vector representations, and, oddly enough, the promised simplicity masks technical subtleties. One detail that stood out to me is how the model frames prediction as maximizing contextual likelihood rather than reconstructing counts — this is the heart of the skip-gram formulation, which optimizes conditional probabilities such as p(c|w), and yet it does so while learning paired word and context representations. I find this framing promising because it ties the embeddings directly to predictive performance, though it also raises practical questions about optimization.

Methodological Clarifications

From softmax …

Similar Posts

Loading similar posts...