LLM post-training harness

A small, explicit experiment harness for SFT/DPO and inference-time scaling (best-of-N, verifiers).

Status: active · GitHub


What it is

A didactic harness for separating post-training from inference-time scaling. It keeps SFT, DPO, and best-of-N verifier selection in a small runnable shape.

GitHub