YouTube embedding pipeline
A local-first pipeline for audio download, Whisper transcription, text embeddings, and audio embeddings.
What it is
A local YouTube-to-embedding pipeline with two paths: talks become Whisper transcripts plus text embeddings, and music becomes sampled audio chunks plus AST-style audio fingerprints.
Outputs are written to Parquet with run metadata so later search and retrieval work has something durable to build on.