ScrAIbe began as an internal research tool for turning lab interviews, field recordings, and technical briefings into searchable, citable text. Standard speech-to-text services struggled with noisy environments, German/English code-switching, and overlapping speakers, so I built a pipeline tailored to research-grade audio.
ScrAIbe is a modular, multilingual transcription and speaker pipeline:
- Whisper-based ASR for high-accuracy transcription and optional translation of segments.
- Speaker diarisation + recognition via Pyannote, with VoxCeleb embeddings for robust speaker separation.
- Automatic language identification using VoxLingua to handle mixed-language recordings cleanly.
- Multiple entry points: a Python API for full control, a CLI for batch jobs, and an optional lightweight Gradio app for quick local runs.
- Server-friendly deployment through Docker when you want consistent lab/on-prem setups.
ScrAIbe is open source because research infrastructure shouldn’t be a black box. If you want a fully no-code experience for teams, the companion project ScrAIbe-WebUI wraps this backend into an easy Docker-deployable web service.