DossierKit / AI Expert

AI / Agent / LLM Expert

Building intelligent systems and delightful experiences with AI.

Featured Work

Deep case studies covering architecture, collaboration, evaluation, and lessons.

2026

Agent Post-training Data Flywheel

Turned production Agent badcases into evaluable, labelable, trainable preference data loops.

  • Preference Data
  • DPO
  • Evaluation
Read more
2025

Enterprise Agent Platform

Built an evaluable, auditable, scalable Agent runtime and tool-use layer for enterprise workflows.

  • Agent Runtime
  • Tool-use
  • LLMOps
  • Evaluation
Read more

Lab Notes

Lab notes showing evaluation, data loops, and applied algorithm depth.

DPO

DPO for Tool-use Preference

Used chosen/rejected preference data to improve enterprise Agent tool-use decisions.

  • DPO
  • Tool-use
  • Preference Data
Read more
Replay Eval

Agent Replay Evaluation Harness

Built reproducible Agent regression evaluation from production traces for prompt, tool, and model changes.

  • Evaluation
  • Replay
  • Regression
Read more

Proof

DossierKit favors proof through work, experiments, metrics, and lessons rather than keyword density.

Can decompose Agent platforms into runtime, tool-use, evaluation, and permission audit modules.

Experienced in turning badcases into preference data and post-training experiments.

A fit for roles requiring architecture, implementation, and business collaboration.

Writing

Method notes, retrospectives, and structured thinking.

2026-06-26

Enterprise Agent Evaluation Playbook

Turning Agent evaluation from subjective trial into replayable, layered, release-gating engineering.

Contact

Open to high-quality AI platform, Agent product, and applied algorithm opportunities

Best for conversations around Agent platforms, LLM application architecture, post-training data loops, and AI product delivery.