TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
arxiv.org·16h
Flag this post

View PDF HTML (experimental)

Abstract:Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by partitioning models across devices, it incurs activation communication overhead that scales linearly with sequence length, limiting efficiency in long-context training. Recent weight-passing approaches (e.g., WeiPipe) mitigate this by transmitting model weights instead of activations, but suffer from redundant peer-to-peer (P2P) transfers and underutilized intra-node bandwidth. We propose TawPipe–topology-aware weight pipeline parallelism, which exploits hierarchical bandwid…

Similar Posts

Loading similar posts...