Now
May 2026.
Working
- Post-training harnesses for SFT, DPO, verifier scoring, and best-of-N selection.
- Telemetry retention for ML systems: logging budgets, summary interfaces, and failure analysis.
- ALife experiments: replicators, program soups, mutation search, and population traces.
- Tabular models: foundation-model baselines, uncertainty, runtime, and training dynamics.
Testing
- Evaluation signals under distribution shift.
- Benchmark design that separates model behavior from dataset fit.
- Which project repos need hosted demos, benchmark results, or just a clean README.
Open To
- Research engineering roles across data, models, evaluation, and production feedback.
- Applied research work with a concrete dataset, experiment, or system to ship.