class: center, middle, inverse, title-slide # A Pan-Australian Model for MAUS ### Hywel Stoakes
1
and Florian Schiel
2
The University of Melbourne
1
, Institut für Phonetiks, LMU, Germany
2
### 2017/12/07 --- class: inverse # Presentation Outline Presentation in two parts ??? Many people here are familiar with the Munich Automatic segmentation tool and many people here have used it in their research and will know that it is a powerful timesaving tool for time-aligning annotations to audio data - this talk is for those who are not and is in two parts -- - Part 1. Briefly introduce the Pan-Australian model. - Part 2. A How-to to quickly apply the Pan-Australian Model for phonetic model for annotating Australian languages. - called **"Aboriginal Languages (AU)"** on BASWebServices. ??? Then I will follow this up with a little information on how to apply this to speech from first language speakers of Australian languages that you may have access to. -- - The BAS Web Services can be accessed at: [http://clarin.phonetik.uni-muenchen.de/BASWebServices/#!/services/WebMAUSGeneral](http://clarin.phonetik.uni-muenchen.de/BASWebServices/#!/services/WebMAUSGeneral)<sup>[1]</sup> .footnote[ [1] MAUS prefers Google Chrome. ] ??? You can access the MAUS model here - they are actually not that easy to use without some induction unifortunately. --- class:inverse, centre, middle - Thanks to Andy Butcher Jonathan Harrington, Steve Cassidy, Raphael Winkelmann, Janet Fletcher, Judith Bishop, Simon Hammond - Thanks to all who have transcribed/corrected the word lists: Debbie Loakes, Rosey Billington, Katie Jepson + many others - Funded by: Transdisciplinary Innovation Grant from The ARC Centre of Excellence for the Dynamics of Language, Partially funded by The Alveo Virtual Laboratory (Phase II). Thanks to The Research Unit for Indigenous Language. ??? Before I go on, I would like to acknowledge my co-author Floran Schiel and all those at IPS Munich and Particularly thank Andy Butcher for his generosity. I'd also like to say that this talk would not have been possible without funding from CoEDL and the tireless hours of work from many linguists and phoneticians from around Australia and the world. --- # Access slides: To access the slides for this presentation and some supplementary material go to: [https://github.com/PhonLab/PanAUS/wiki](https://github.com/PhonLab/PanAUS/wiki) --- # A Pan-Australian Phonetic Alignment Model - The basis of the acoustic model are drawn from a database of phonetic word lists recorded by Andy Butcher: henceforth the **Butcher Corpus** .footnote[ • Hamilton, P. J. (1996). Phonetic Constraints and Markedness in the Phonotactics of Aus- tralian Aboriginal Languages. PhD Thesis, University of Toronto.<br/> • Fletcher, J. M. and Butcher, A. R. (2014). Sound patterns of Australian languages. In Nordlinger, R. and Koch, H., editors, The Languages and Linguistics of Australia: A Comprehensive Guide, chapter 3. De Gruyter Mouton. ] -- - A Pan-Australian model is based on the untested assumption that there is a superset of Australian phones upon which each language draws (cf. Hamilton, 1996) -- - The Butcher Corpus is a phonetically balanced corpus of speech from languages across Australia, in which all phonemes are recorded in word initial, word medial and word final position -- - Butcher’s phonological hypotheses (Butcher 1992, 1999, 2004, 2006; Fletcher and Butcher 2014) are used as a basis for grapheme to phoneme rules --- class: inverse # How is the Pan-Australian model different to existing models? - In MAUS there are language independent models available (Language Independent SAMPA). -- - These too are based on the presupposition that there is a superset of phonetic building blocks from which all languages are constructed (Encoded by The IPA). -- - These existing models didn't work out of the box with Australian language recordings due to gaps in the phonemic models. -- - Some specific phonemes were missing from current automatic segmentation algorithms which are common in Australian languages. <!-- - Specifically models for: --> - retroflexes - lamino-dentals and laminal articulations - prestopped nasals --- class: inverse # Constructing the model - Work started almost 10 years ago on segmenting wordlists first using automatic segmentation. -- - Some files were labelled automatically with hand corrections and some were labelled by phoneticians from scratch. -- - Included audio data from various modes, with Butcher's word-lists at the core -- - The addition of narratives and conversations will build this into a representative synchronic sample of modern Australian languages. --- # Languages in the Database |language |classification | |:-------------------|:----------------| |Eastern Arrernte |Pama-Nyungan | |Burarra |Non-Pama-Nyungan | |Kunwinjku |Non-Pama-Nyungan | |Mawng |Non-Pama-Nyungan | |Nyangumarta |Non-Pama-Nyungan | |Wik-Mungkan |Pama-Nyungan | |Warlpiri |Pama-Nyungan | |Gupapuyngu (YM) |Pama-Nyungan | |Djambarrpuyngu (YM) |Pama-Nyungan | |Gumatj (YM) |Pama-Nyungan | .footnote[ [2] YM = Yolŋu Matha. ] --- class: inverse, center, middle # Mapping the languages in the database --- class: inverse # Mapping the languages in the database The languages included don't cover the whole country so far - concentrated in the northern latitudes (locations from Chirila, (Bowern, 2016).
.footnote[ • Bowern, C (2016). Chirila: Contemporary and Historical Resources for the Indigenous Languages of Australia. Language Documentation and Conservation. Vol 10. http://nflrc.hawaii.edu/ldc/ ] --- <!-- background-image:  --> class: inverse # Database - Eight Languages (10 varieties) - 37 speakers (male and female) - 33248 words - 137951 phones - Over 40,000 Vowels (/ɐ/ is most common) --- # Database Structure - To contribute to the database as it stands annotations need to be in the following format. - 3 tiers - ORT (Orthographic Word Level) - SAP (SAMPA phonetic tier) - PHO (IPA Phonetic tier) -- - An optional uttarance tier (UTT) at the top level -- - These are ideally Praat TextGrids, but could also be ELAN *.eaf files or any other software that can read and write compliant TextGrids --- # What is MAUS? - The **M**unich **AU**tomatic **S**egmentation tool is a forced aligner that uses Hidden Markov Models and Viterbi algorithms to assign arbitrary labels to stretches of speech. -- - Developed by the second author (Schiel, 1999) who has constructed models for many languages (mainly European) -- - MAUS is only part of a suite of tools developed at IPS in Munich that assist in segmenting and analysing language phonetically. - Time aligning labels to audio is also essential metadata for making speech recordings searchable. .footnote[ • Schiel, F. (1999). Automatic Phonetic Transcription of Non-Prompted Speech. In Proc. of the ICPhS (pp. 607-610).<br/> • Kisler, T. and Reichel U. D. and Schiel, F. (2017): Multilingual processing of speech via web services, Computer Speech & Language, Volume 45, September 2017, pages 326–347. ] --- # Using the model via the web interface - This model was trained in August 2017 in Munich. - The designed to be accessed from the web:<br/> [http://clarin.phonetik.uni-muenchen.de/BASWebServices/#!/services/WebMAUSGeneral](http://clarin.phonetik.uni-muenchen.de/BASWebServices/#!/services/WebMAUSGeneral) -- - can be slow to upload large amounts of data. -- - The minimal requirements to use the Pan-Australian model are: - an audio file (ideally *.wav mono 16bit) - a text file containing an orthographic transcription (\*.txt or \*.par) - a grapheme to phoneme mapping file (mapping every grapheme to a phoneme available in the model) - a rule set to show regular pronounciation alternations (optional) -- - It is possible to use the model locally although it is not recommended. --- class: inverse, center, middle # Three steps  --- # Step 1: Grapheme to Phoneme - A G2P is a text file that converts the orthographic text file into something that MAUS understands - Developed by Uwe Reichel and Thomas Kisler .footnote[ [4] • Reichel, U.D., Kisler, T. (2014). Language-independent grapheme-phoneme conversion and word stress assignment as a web service. In: Hoffmann, R. (Ed.): Elektronische Sprachverarbeitung. Studientexte zur Sprachkommunikation 71, pp 42-49, TUDpress, Dresden. ] --- class: inverse, center, middle  --- class: inverse, center, middle  --- # Step 1: G2P You must build this set of language specific rules (This is a subset for Kunwinjku): ``` a;6 e;E i;I o;O u;U p;pp b;b bb;pp k;kk k;g kk;kk t;tt d;d dd;tt rt;t`t` rd;d` rdd;t`t` rdrd;t`t` djdj;cc dj;c y;j w;w ``` --- # Step 1: G2P Can specify positional rules: - word initial [+voice] -> [-voice] ``` #b;p #g;k #d;t ``` - word final b -> p ``` #b;p ``` - intervocalic word medial lenition ``` set VOC a;6 set VOC e;E set VOC i;I set VOC o;O set VOC u;U $VOC$d$VOC$;. 4 . ``` - apico-alveolar stop to tap (a tap [ɾ] is "4" in X-Sampa) --- # Step 1: G2P - The most important orthographic question that needs to be answered before moving further is what do the graphemes <j> and <y> denote? Are they properly mapped - -- - is <j> laminal-palatal stop [c]/[ɟ] or a lamino-palatal glide [j]? - is <y> a lamino-palatal glide [j]? - What are the IPA values of citation vowels? - Are there any positional considerations? - i.e. allophonic vaiation? <!-- --- --> <!-- # Step 1: G2P --> <!-- - In the G2P you can actually use a TextGrid as input and if so you need to specify the sample-rate of the associated audio file. --> --- # Consonant Inventory for Bininj Kunwok <!--  --> | |bilabilal| alveolar | retroflex | palatal | velar |glottal | |:---|:-------:|:--------:|:---------:|:-------:|:-------:|:-------:| |long stop| pː | tː | ʈː | cː | kː | | |stop| b / p | d / t | ɖ / ʈ | c / ɟ | k / ɡ | ʔ | |nasal| m | n | ɳ | ɲ | ŋ || |lateral| | l | ɭ | | || |rhotic| | r | (ɻ) | | || |glide| w | ɹ | | j | || ## IPA --- # Consonant Inventory for Bininj Kunwok <!--  --> | |bilabilal| alveolar | retroflex | palatal | velar |glottal | |:---|:-------:|:--------:|:---------:|:-------:|:-------:|:-------:| |long stop| bb | dd | rdd | djdj | kk | | |stop| b | d | rd | dj | k | h | |nasal| m | n | rn | nj | ng || |lateral| | l | rl | | || |rhotic| | rr | r | | || |glide| w | r | | y | || ## orthography --- # Consonant Inventory for Bininj Kunwok <!--  --> | |bilabilal| alveolar | retroflex | palatal | velar |glottal | |:---|:-------:|:--------:|:---------:|:-------:|:-------:|:-------:| |long stop| pp | tt | t`t` | cc | kk | | |stop| p/b | t | t` | c | k/g | ? | |nasal| m | n | n` | J | N || |lateral| | l | l` | | || |rhotic| | r | r- | | || |glide| w | r- | | j | || ## [X-sampa](https://en.wikipedia.org/wiki/X-SAMPA) modified --- # Step 1: G2P - The output from the G2P stage will be a \*.par file -- - After you have the par file that is output from G2P - You can progress to the next step. .footnote[ a par file is a BAS A 'par' file, short for 'a file with extension par' (e.g. 'Signal01.par'), which denotes a BAS Partitur Format (BPF) annotation file. BPF is based on the SAM annotation standard, and is a simple standard to store hierarchical and time-aligned annotations. More info [here](http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html#Partitur) ] --- class: inverse # Step 2: Chunking - Long audio files don't work well with MAUS - Errors can compound over time. - Chunking provides a way of providing MAUS with small amounts of data, so it is not overloaded. - Alternatives are splitting soundfiles into subfiles around a minute long. --- # Step 3: MAUS - Input into MAUS (wav file and par file output from the Chunker in step 2). - which parameters should I use/change? - WebMAUS General - WebMAUS basic will not give good results for these languages - Rule files (similar to G2P in implementation) - These are optional was to give simple alternation rules and could become quite complex for some languages - I'm looking at you Arrernte! --- # Step 3: MAUS, Parameters Use the defaults apart from the **Language** and optionally the **Rule set file**.  --- class: inverse # Optional: Pipelining - We don't recommed this option until the model has been fully tested but depending on the language this is an easy first step to see if you are getting good results -- - You get no control over the G2P input and the rule files however. --- # Alternate Forced Aligners: The Future? - Only Listing one that can be trained on small amounts of data: - [Montreal Forced Aligner](http://montreal-forced-aligner.readthedocs.io/en/stable/introduction.html) -- ## Next Steps: - The Australian language models have now been added to the language independent model giving access to retroflex phonemes, Australian specific cluster environments, laminal articulations, prestopped nasals -- ## We want more data - There are not yet sufficient resources to prepare the data for inclusion. We will provide some guidelines for adding your data to the website - [https://github.com/PhonLab/PanAUS/wiki](https://github.com/PhonLab/PanAUS/wiki) --- ## Some Examples of Output.  ---  --- --- # Sounds of Bininj Kunwok <audio controls> <source src="audio/JillMimihStory.mp3"> <p>Your browser does not support audio playback, download the file:</p> <a href="audio/JillMimihStory">MP3</a></audio>  <small> birriyawam, birri-yawam na-kudji bi-ngalkeng <h6>they looked for him, one person found him</h6> yimeng, "naneh nga-ngalkeng, ngurri-mray karri-bun" <h6>he said, "I've found him, all of you come here, we'll kill him"</h6> birri-kadjuy, birri-bebkeng kabbal <h6>they followed him, they made him come out onto the flood plain</h6> birri-kadjuy munguy, birri-djalwam, birri-djalwam, birri-bom <h6>they chased him for a long time, they just kept going and going, and they killed him</h6> birri-bom, kunak birri-worrhmeng <h6>they killed him, they made a fire</h6> birri-mey, birri-kinjeng <h6>they took him, they burned him</h6> </small> <div class="notes"> A rather gruesome story but one that elicits some good intonation examples </div> --- class:inverse, centre, middle - Thanks to Andy Butcher Jonathan Harrington, Steve Cassidy, Raphael Winkelmann, Janet Fletcher, Judith Bishop, Simon Hammond - Thanks to all who have transcribed: Debbie Loakes, Rosey Billington, Katie Jepson + many others - Funded by: Transdisciplinary Innovation Grant from The ARC Centre of Excellence for the Dynamics of Language, Partially funded by The Alveo Virtual Laboratory (Phase II). Thanks to The Research Unit for Indigenous Language, ??? First I would like to acknowledge my co-author Floran Schiel and also say that this talk would not have been possible without funding from CoEDL ---