Quit Emailing Yourself

# reinforcement-learning → large-language-models

7 links tagged with all of: reinforcement-learning + large-language-models

Click any tag below to further narrow down your results

Links

Supercharging LLMs: Scalable RL with torchforge and Weaver

The article discusses how the torchforge library simplifies large-scale reinforcement learning for large language models (LLMs). It highlights the collaboration with Stanford and CoreWeave, showcasing the use of Weaver as a verifier to enhance training efficiency and accuracy without relying on extensive human annotations.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved February 14, 2026 · 6 min read

+ torchforge reinforcement-learning ✓ + weaver large-language-models ✓ + distributed-systems

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

This article introduces a new approach to reinforcement learning called Uniqueness-Aware Reinforcement Learning, aimed at improving how large language models (LLMs) solve complex reasoning tasks. By rewarding rare and effective solution strategies rather than common ones, the method enhances diversity and performance in problem-solving without sacrificing accuracy. The authors demonstrate its effectiveness across multiple benchmarks in mathematics, physics, and medical reasoning.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved February 14, 2026 · 2 min read

reinforcement-learning ✓ + uniqueness + problem-solving large-language-models ✓ + exploration

GitHub - xhyumiracle/Awesome-AgenticLLM-RL-Papers

The repository serves as a comprehensive resource for the survey paper "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey," detailing various reinforcement learning methods and their applications to large language models (LLMs). It includes tables summarizing methodologies, objectives, and key mechanisms, alongside links to relevant papers and resources in the field of AI.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved October 29, 2025 · 7 min read

reinforcement-learning ✓ large-language-models ✓ + agentic-llm + research-survey + machine-learning

Inference-Time Scaling for Generalist Reward Modeling

The paper explores the enhancement of reward modeling in reinforcement learning for large language models, focusing on inference-time scalability. It introduces Self-Principled Critique Tuning (SPCT) to improve generative reward modeling and proposes a meta reward model to optimize performance during inference. Empirical results demonstrate that SPCT significantly enhances the quality and scalability of reward models compared to existing methods.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved October 29, 2025 · 2 min read

reinforcement-learning ✓ + reward-modeling large-language-models ✓ + inference-scaling + generative-models

Open Source RL Libraries for LLMs | Anyscale

Reinforcement learning (RL) is becoming essential in developing large language models (LLMs), particularly for aligning them with human preferences and enhancing their capabilities through multi-turn interactions. This article reviews various open-source RL libraries, analyzing their designs and trade-offs to assist researchers in selecting the appropriate tools for specific applications. Key libraries discussed include TRL, Verl, OpenRLHF, and several others, each catering to different RL needs and architectures.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved October 29, 2025 · 6 min read

reinforcement-learning ✓ + open-source + libraries large-language-models ✓ + agentic-rl

JudgeLRM: Large Reasoning Models as a Judge

JudgeLRM introduces a novel approach to using Large Language Models (LLMs) as evaluators, particularly in complex reasoning tasks. By employing reinforcement learning with judge-wise rewards, JudgeLRM models significantly outperform traditional Supervised Fine-Tuning methods and current leading models, demonstrating superior performance in tasks that require deep reasoning.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved October 29, 2025 · 1 min read

large-language-models ✓ + reasoning reinforcement-learning ✓ + evaluation + machine-learning

The Art of Scaling Reinforcement Learning Compute for LLMs

Reinforcement learning (RL) is essential for training large language models (LLMs), but there is a lack of effective scaling methodologies in this area. This study presents a framework for analyzing RL scaling, demonstrating through extensive experimentation that certain design choices can optimize compute efficiency while maintaining performance. The authors propose a best-practice recipe, ScaleRL, which successfully predicts validation performance using a significant compute budget.

Saved by <a href="/u/tldr-importer">tldr-importer</a> · Last saved October 29, 2025 · 2 min read

reinforcement-learning ✓ large-language-models ✓ + scaling-methodologies + compute-efficiency + best-practices