The TartanAviation ATC Collection: Audio, ADS-B, and 531k Labels
Jun 5, 2026
Three Hugging Face datasets from CMU's TartanAviation air traffic control corpus: paired audio and ADS-B clips, a half-million-utterance VAD split, and a full ASR label set. Why decode-time biasing and LLM correction lost to a plain ROVER vote, and the one signal that moved label quality.