Skip to main content

ScrAIbe: Research-Grade Transcription

ScrAIbe started as an internal tool for converting laboratory interviews and sensor briefings into searchable text. Off-the-shelf speech-to-text services struggled with German-English code switching and overlapping speakers, so I built a pipeline that combines state-of-the-art ASR with probabilistic diarisation and confidence scoring.

Key pieces include:

  • A modular inference stack (Whisper + Pyannote) orchestrated through containerised workers so labs can run the service on-premises.
  • A calibration layer that flags low-confidence passages and surfaces timestamps for quick review.
  • Automated QC reports that let researchers jump straight to the segments that need manual corrections. The project is open source because reproducible infrastructure should not be a black box. You can read the documentation, run the containers locally, or extend the diarisation modules for your own corpora.