Algebraic Effects, Delimited Continuations, Computational Effects, Control Abstraction
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
arxiv.orgΒ·2d
Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models
arxiv.orgΒ·54m
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
arxiv.orgΒ·1d
Why Your Next LLM Might Not Have A Tokenizer
towardsdatascience.comΒ·1d
Loading...Loading more...