Learning from Failures: Error-Driven Reinforcement Learning for Tool Use

Ziyi Wang1 · Yuxuan Lu1 · Dakuo Wang1 · et al.
1 Northeastern University
Paper Blog Code Data
TL;DR

We propose RAFT (Reinforcement from Agent Failure Tasks), turning tool-use agent failed trajectories into targeted executable tasks and building an error-driven RL pipeline, achieving 82.5 Pass^1 on Tau2-Bench Retail.

Method

We present RAFT (Reinforcement from Agent Failure Tasks), an error-driven RL pipeline that converts failed tool-use trajectories into targeted executable tasks and trains the agent to correct its own mistakes.

RAFT Pipeline

Figure 1: The RAFT pipeline. Failed agent trajectories are analyzed, decomposed into executable subtasks, and fed back as RL training signal.

Results

Error-driven RL consistently improves over both the base model and the SFT baseline across all Pass^k metrics on Tau2-Bench Retail, with the largest gains visible at stricter consistency thresholds (Pass^3, Pass^4).

82.5
+6.6 vs SFT
Pass^1
73.0
+8.5 vs SFT
Pass^2
66.2
+8.7 vs SFT
Pass^3
61.4
+8.8 vs SFT
Pass^4

Pass^k results on Tau2-Bench Retail (Qwen3-30B-A3B-Thinking-2507). RAFT (SFT + RL) consistently outperforms both the base model and SFT baseline. Higher is better; Pass^k is the probability that all k independent trials succeed.