GoLongRL: Capability-Oriented Long Context RL with Multitask Alignment (opens in new tab)
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment - xiaoxuanNLP/GoLongRL
Read the original articleGoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment - xiaoxuanNLP/GoLongRL
Read the original article