emDia - Speaker diariser
About the tool
What is it good for? What does it do?
When analysing a recording with multiple speakers, the program emDia can tell "who is speaking when". This s called speaker diarisation. It can tell the difference between speech sounds and recognises when speakers are taking turns.
What is the input?
A sound file (e.g. in .wav or .mp3 format).
What is the output?
A text file conforming the standards used in this field (RTTM (Rich Transcription Time Marked) format), which contains information on which speaker is talking in a given section of the recording, listed row by row. The algorithm only detects speaker changes, not speaker identities.
An example:
An example for an output file part (speaker change at the 47th second of the recording, a new speaker taking turn):
SPEAKER SpeechNonSpeech 1 46.670 0.300 <NA> <NA> SPK01 <NA>
SPKR-INFO SpeechNonSpeech 1 <NA> <NA> <NA> unknown SPK16 <NA>
SPEAKER SpeechNonSpeech 1 46.970 2.220 <NA> <NA> SPK16 <NA>
For developers:
Source | https://github.com/juditacs/hunspeech/blob/master/speaker_diarization/em-dia.py |
Source language | Python |
Input | .wav, .mp3, or any other audio format supported by the SoX (Sound Exchange) tool. |
Output | Two RTTM-compatible files created as the output of the SHOUT tool, which contain information on speech-silence-noise, and the different audio segments assigned to the respective speakers. |
Execution |
python em-dia.py [-h] [-m SHOUT_MODEL] [-s SAD_FN] input_fn output_dir shout_dir
The meanings of the arguments can be accessed via the em-dia.py --help command. |
Licence | GPL |