Quit Emailing Yourself

# large-language-models → inference-scaling

2 links tagged with all of: large-language-models + inference-scaling

Links

Sleep-time Compute: Beyond Inference Scaling at Test-time

Sleep-time compute is introduced as a method to enhance the efficiency of large language models by allowing them to anticipate user queries and pre-compute relevant data, significantly reducing test-time compute requirements. The study shows that this approach can lower compute needs by approximately 5x and improve accuracy by up to 18% on specific reasoning tasks. Additionally, a Multi-Query extension is proposed to further optimize compute costs across related queries.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ sleep-time-compute large-language-models ✓ inference-scaling ✓ + reasoning-tasks + query-predictability

Inference-Time Scaling for Generalist Reward Modeling

The paper explores the enhancement of reward modeling in reinforcement learning for large language models, focusing on inference-time scalability. It introduces Self-Principled Critique Tuning (SPCT) to improve generative reward modeling and proposes a meta reward model to optimize performance during inference. Empirical results demonstrate that SPCT significantly enhances the quality and scalability of reward models compared to existing methods.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ reinforcement-learning + reward-modeling large-language-models ✓ inference-scaling ✓ + generative-models