Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.
Implement custom reward logic and connect it to slime rollout/training safely.
Use this skill when:
Pick one of these:
--group-rm disabled): custom function gets one Sample--group-rm enabled): custom function gets list[Sample]slime.rollout.rm_hub.__init__.py calls your function via --custom-rm-path.
Create slime/rollout/rm_hub/<your_rm>.py.
Supported signatures:
async def custom_rm(args, sample):
return float_reward_or_reward_dict
async def custom_rm(args, samples):
return list_of_rewards
If using group mode, return one reward per sample in input order.
reward_key / eval_reward_key is configured.To customize normalization/shaping before advantage computation, add:
def post_process_rewards(args, samples):
# return (raw_rewards, processed_rewards)
...
Wire with:
--custom-reward-post-process-path <module>.post_process_rewards
This hook is consumed in slime/ray/rollout.py.
Use:
--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm
reward_key configslime/rollout/rm_hub/__init__.pyslime/ray/rollout.pydocs/en/get_started/customization.md