1 link tagged with all of: language-models + reinforcement-learning + next-token-prediction
Click any tag below to further narrow down your results
Links
Reinforcement Pre-Training (RPT) is introduced as a novel approach for enhancing large language models through reinforcement learning by treating next-token prediction as a reasoning task. RPT utilizes vast text data to improve language modeling accuracy and provides a strong foundation for subsequent reinforcement fine-tuning, demonstrating consistent improvements in prediction accuracy with increased training compute.
reinforcement-learning ✓
language-models ✓
next-token-prediction ✓
+ pre-training
+ scaling-paradigms