Breaking chains with trees: Deep learning with $\mathcal{O}(\log N)$ parallel time complexity (opens in new tab)
Modern deep neural network architectures are trained via backpropagation, which requires errors to be sequentially propagated through all layers before parameters can be updated. This introduces two limitations: locking, where layer-wise updates are strictly interdependent and cannot proceed in parallel, and the weight transport problem, which requires symmetric forward and backward pathways for exact gradient computation. These constraints re...
Read the original article