7 links tagged with all of: reinforcement-learning + large-language-models
Click any tag below to further narrow down your results
+ machine-learning
(2)
+ inference-scaling
(1)
+ distributed-systems
(1)
+ weaver
(1)
+ torchforge
(1)
+ exploration
(1)
+ problem-solving
(1)
+ uniqueness
(1)
+ research-survey
(1)
+ agentic-llm
(1)
+ generative-models
(1)
+ scaling-methodologies
(1)
+ reward-modeling
(1)
+ agentic-rl
(1)
+ libraries
(1)
Links
The article discusses how the torchforge library simplifies large-scale reinforcement learning for large language models (LLMs). It highlights the collaboration with Stanford and CoreWeave, showcasing the use of Weaver as a verifier to enhance training efficiency and accuracy without relying on extensive human annotations.
This article introduces a new approach to reinforcement learning called Uniqueness-Aware Reinforcement Learning, aimed at improving how large language models (LLMs) solve complex reasoning tasks. By rewarding rare and effective solution strategies rather than common ones, the method enhances diversity and performance in problem-solving without sacrificing accuracy. The authors demonstrate its effectiveness across multiple benchmarks in mathematics, physics, and medical reasoning.
The repository serves as a comprehensive resource for the survey paper "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey," detailing various reinforcement learning methods and their applications to large language models (LLMs). It includes tables summarizing methodologies, objectives, and key mechanisms, alongside links to relevant papers and resources in the field of AI.
The paper explores the enhancement of reward modeling in reinforcement learning for large language models, focusing on inference-time scalability. It introduces Self-Principled Critique Tuning (SPCT) to improve generative reward modeling and proposes a meta reward model to optimize performance during inference. Empirical results demonstrate that SPCT significantly enhances the quality and scalability of reward models compared to existing methods.
reinforcement-learning ✓
+ reward-modeling
large-language-models ✓
+ inference-scaling
+ generative-models
Reinforcement learning (RL) is becoming essential in developing large language models (LLMs), particularly for aligning them with human preferences and enhancing their capabilities through multi-turn interactions. This article reviews various open-source RL libraries, analyzing their designs and trade-offs to assist researchers in selecting the appropriate tools for specific applications. Key libraries discussed include TRL, Verl, OpenRLHF, and several others, each catering to different RL needs and architectures.
JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.
Reinforcement learning (RL) is essential for training large language models (LLMs), but there is a lack of effective scaling methodologies in this area. This study presents a framework for analyzing RL scaling, demonstrating through extensive experimentation that certain design choices can optimize compute efficiency while maintaining performance. The authors propose a best-practice recipe, ScaleRL, which successfully predicts validation performance using a significant compute budget.
reinforcement-learning ✓
large-language-models ✓
+ scaling-methodologies
+ compute-efficiency
+ best-practices