Projects
Research tools, production-shaped ML work, and app experiments.
Test-time training lab
A compact PyTorch lab for fast-weight and test-time-training ideas, with toy equivalence demos and a small language-model training harness.
Concensus SFT
A summarization modeling project with data preparation, fine-tuning artifacts, and loss-curve diagnostics.
Second brain knowledge pipeline
A local-first knowledge system for turning papers and notes into reviewed article nodes, concept graphs, and publishable vault output.
Random Neighbors
Random-forest-style feature bagging for high-dimensional clustering experiments.
Reliability modeling templates
Production-style baseline templates for classification, survival, anomaly, and time-to-event modeling.
Field-to-test reliability modeling
Reliability modeling patterns for translating messy field behavior into test plans and failure-risk estimates.
Media ingest pipeline
A local-first ingest pipeline for transcripts, audio features, retrieval artifacts, and repeatable experiment runs.
Industrial anomaly detection
A defect-detection lab for reconstruction, segmentation, and evaluation on industrial visual anomaly data.
BirdCLEF audio modeling
Audio classification experiments around spectrograms, augmentation, validation discipline, and competition constraints.
Tabular foundation model lab
A practical lab notebook for TabPFN, TabICL, uncertainty, conformal prediction, and tabular baselines.
Agentic research template
A reusable repo shape for research agents: plans, artifacts, eval notes, and reproducible handoff state.
Local embeddings
A local embedding workspace for indexing, comparing, and inspecting text representations without external services.
Audio embeddings
Audio representation experiments across waveforms, spectra, model embeddings, and retrieval-ready artifacts.
Mixture-of-experts deep dive
A small lab for MoE routing, load balancing, expert specialization, and failure modes.
ResNet identity mini
A compact experiment around residual identity mappings, trainability, and small-model diagnostics.
RL context compaction
Experiments around using reinforcement learning to choose what context to retain, compress, or drop.
RL compaction
A smaller reinforcement-learning lab for compaction policies, rewards, and evaluation traces.
RL gym from Sutton
Small reinforcement-learning environments and experiments grounded in Sutton-style examples.
Streaming train demo
A demo of online training traces, incremental metrics, and model behavior while data arrives.
YouTube embedding pipeline
A local-first pipeline for audio download, Whisper transcription, text embeddings, and audio embeddings.
Complexity-aware program evolution
Program evolution experiments that track fitness, complexity, and the tradeoff between improvement and bloat.
Erdos concentration
An interactive app for concentration phenomena, tails, and the geometry of probability bounds.
Random matrix visualizer
An interactive visualization app for eigenvalue clouds, spectra, and random-matrix intuition.
Paxos explore
An interactive systems app for stepping through consensus messages, timing, and agreement behavior.
Tierra web
A browser playground for Tierra-style program evolution, mutation, replication, and population dynamics.
Blindwatchmaker
An interactive evolutionary-art playground for selection, mutation, and visual search.
LLM post-training harness
A small, explicit experiment harness for SFT/DPO and inference-time scaling (best-of-N, verifiers).
Vienna EP UI
A UI experiment around energy, inference, and controllable visual exploration.
Computational Life (ALife program soup)
A reproduction and extension of program-soup ALife experiments: BFF tapes, replicators, and phase-transition-like dynamics.
Event recommender system
A recommender-system project around event affinity, user behavior signals, and retrieval-ready recommendations.
Opioid prescribing residuals
An applied modeling project for finding high-residual prescribing patterns after accounting for expected variation.
Bayesian marketing attribution
Credible intervals and Bayesian regression for deciding which marketing channels are signal versus noise.
Lead scoring ML system
An end-to-end lead scoring project across data cleaning, feature engineering, model validation, and deployment shape.