Lab

Lab Notes

DPO

DPO for Tool-use Preference

A framework for turning tool-use badcases into preference data and post-training experiment interfaces.

  • DPO
  • Tool-use
  • Preference Data
Read more
Replay Eval

Agent Replay Evaluation Harness

Built reproducible Agent regression evaluation from production traces for prompt, tool, model, and workflow changes.

  • Evaluation
  • Replay
  • Regression
Read more