Authors: Ruizhong Qiu*, Gaotang Li*, Tianxin Wei, Jingrui He, Hanghang Tong
University of Illinois Urbana-Champaign
More details coming soon... Star our GitHub repo to stay tuned!
SAFFRON-1 introduces the first inference scaling paradigm tailored for LLM safety assurance, addressing the shortcomings of existing methods like Best-of-N, Beam Search, and MCTS under adversarial settings. Our method replaces expensive process reward models (PRMs) with a novel multifurcation reward model (MRM) that reduces reward evaluations while improving robustness and efficiency.
While inference-time scaling has greatly advanced reasoning tasks, it fails to scale efficiently in safety settings. Existing techniques suffer from what we identify as the exploration-efficiency dilemma: more exploration incurs high computational cost due to frequent PRM calls, limiting their scaling efficiency.
Figure 1: Comparison between PRM-based and MRM-based tree search procedures.
Figure 2: Comparison with existing inference scaling methods.
We propose Safe Multifurcation (SAFFRON), an inference scaling paradigm with the following key innovations:
Figure 3: Trie-based cache sharing across sequences with common prefixes.
To facilitate future research on LLM safety, we release our trained MRM SAFFRON-1 and the training dataset Safety4M: