mHC [Paper Quick-Reading]
dev.to·4d·
Discuss: DEV
🧩LLM Integration
Preview
Report Post

Before we talk about this mHC paper, we must understand residual connection paradigm and Hyper-Connections(HC) .

ResNets And Hyper-Connections

The structure of a single-layer can be formulated as follows:

xl+1=xl+f(xl,wl)x_{l+1} = x_l + f(x_l, w_l)

The term identity mapping refers to xlx_l itself, which emphasizes the property that the signal from the shallower layers maps directly to deeper layer without any modification.

Hyper-Connections have introduced a new dimension to the residual connection. The improvements as follows:

  1. Decoupled Information Capacity. HC decoupled the width of the residual stream from the layer’s input dimension. This allows model to carry much more information between layers than residual connection.
  2. Expanding the width of res…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help