DPO for Tool-use Preference
A framework for turning tool-use badcases into preference data and post-training experiment interfaces.
Read moreLab
A framework for turning tool-use badcases into preference data and post-training experiment interfaces.
Read moreBuilt reproducible Agent regression evaluation from production traces for prompt, tool, model, and workflow changes.
Read more