emSad - Speech activity detector
Abut the tool
What is it good for? What does it do?
The Speech Activity Detection (SAD) module carries out speech segmentation on audio files. Three kinds of segments are defined: speech, silence and noise. A fájlokat háromféle szegmensre bontja: beszéd, csend és zaj. Speech Activity Detection is the first step preceding any further speech processing.
What is the input?
An audio file in either .wav, .mp3 or .raw format In case of a .raw file one has provide the appripriate parameters. (16 kHz, 16 bit little endian).
What is the output?
The module can create three kinds of output: segment file in SHOUT format (listing segments and their length), audio file cut into segments and three files merging segments according to their type: a merged speech- , a noise- and a silence file.
An example.
Input: radio broadcast
Output: SPEAKER SpeechNonSpeech 5 1.220 1.040 <NA> <NA> SPEECH <NA> SPEAKER SpeechNonSpeech 5 2.260 3.950 <NA> <NA> SOUND <NA> SPEAKER SpeechNonSpeech 5 6.210 0.750 <NA> <NA> SPEECH <NA>
For developers
Source | https://github.com/juditacs/hunspeech/blob/master/speech_activity_detection/sad.py |
Source language | Python 3 |
Input | .wav, .mp3, or any other audio format supported by the SoX (Sound Exchange) tool. |
Output | Two RTTM-compatible (Rich Transcription Time Marked) files created as the output of the SHOUT tool, and/or one audio file (.wav) per segment, and/or one merged audio file per segment type (.wav). |
Execution | python3 sad.py -i input.wav -m shout.sad (see also --help) |
Licence | GPL |