LLM post-training harness
A small, explicit experiment harness for SFT/DPO and inference-time scaling (best-of-N, verifiers).
What it is
A didactic harness for separating post-training from inference-time scaling. It keeps SFT, DPO, and best-of-N verifier selection in a small runnable shape.