Lab

Lab Notes

DPO

A framework for turning tool-use badcases into preference data and post-training experiment interfaces.

Replay Eval

Built reproducible Agent regression evaluation from production traces for prompt, tool, model, and workflow changes.