Quit Emailing Yourself

# large-language-models → sleep-time-compute → inference-scaling → query-predictability

1 link tagged with all of: large-language-models + sleep-time-compute + inference-scaling + query-predictability

Sleep-time Compute: Beyond Inference Scaling at Test-time

Sleep-time compute is introduced as a method to enhance the efficiency of large language models by allowing them to anticipate user queries and pre-compute relevant data, significantly reducing test-time compute requirements. The study shows that this approach can lower compute needs by approximately 5x and improve accuracy by up to 18% on specific reasoning tasks. Additionally, a Multi-Query extension is proposed to further optimize compute costs across related queries.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

sleep-time-compute ✓ large-language-models ✓ inference-scaling ✓ + reasoning-tasks query-predictability ✓

Links

Sleep-time Compute: Beyond Inference Scaling at Test-time