emPros - Prosody Parser

About the tool

What is it good for? What does it do?

The program analyses and transcribes the intonation of verbal utterances in spontaneous communication. After the retrieval of the acoustic parameters from the archived recordings and their processing and stylization per speaker, it labels the utterances’ intonation relative to the participants’ individual range of voice. The program was designed mainly with the analysis of multi-member interactions in mind, but it can also be used to process texts read out loud, as well as monologues.

What is the input?

The audio file recording the conversation and a respective annotation, containing the temporal position of the utterances on a separate axis (annotation level) for each speaker. There are no restrictions concerning the number of speakers, the segmentation of the utterances or the content of the labels. Empty labels are interpreted as the speaker in question being silent. In case there is no annotation of speaker change the script assigns the entire audio content of the file to one speaker, after the automatic detection of the speech pauses.

What is the output?

The annotation returned as an output describes the utterances on four levels for each speaker. The first level tags the utterances into melody segments, assigning one of the five categories to each segment: "rise", "fall", "descending", "ascending", or "level". The second level places the starting and end points of the melody segments within the speaker’s individual voice range divided into five levels (L2 < L1 < M < H1 < H2). The third level assigns the original values measured in Hertz to the relative values of the previous level. The fourth level separates the voiced ("V") and voiceless ("U") segments of the speech.

An example:

The sample in the source code’s input folder (sample.wav, sample.TextGrid) contains the archived recording of a dialogue (to preserve the speakers’ anonymity, only their intonation can be heard on the recording) as well as the utterances’ intervals per speaker ("speakerA" and "speakerB"). The output is the sample_pitch.TextGrid file in the output folder, containing the intonation of the utterances on 8 (4+4) annotation levels. Overlapping speech is not analysed.

For developers

Source https://github.com/szekrenyesi/prosotool
Source code Praat (6.0.13) script
Input At least one sound file in .wav format, together with a Praat TextGrid file under the same name. Any number of such pairs (sound + annotation) can be placed in the input folder.
Output A text file in Praat TextGrid format, which is placed in the output folder. Naming convention: [input name] + "_pitch". Several further subfolders (their names following that of the original input) will be created in the output folder, containing partial results. The folder "sp_sep" contains a version of the speakers' speech which had been cleaned from overlaps and is isolated, as generated during the preprocessing.
Execution Praat prosotool.praat <input_path> <stylization> <smoothing> <pitch_extraction> <operating_system>
Meaning and value range of the options:
  • INPUT_PATH : Directory path of the input folder. Value range: string
  • STYLIZATION : Resolution of the stylisation of the F0 curve. A higher value results in a stronger slytisation (coarser resolution). Value range: integers
  • SMOOTHING : The parameter determining the F0 curve smoothing. A lower value results in a stronger smoothing. Value range: real numbers
  • PITCH_EXTRACTION : The determining method of the F0 measurements' parameters . Value range: standard | dynamic
  • OPERATING_SYSTEM : Running environment: to be specified because of folder actions. Value range: windows | unix
Licence GPL