About the corpus

The NCCU Corpus of Spoken Taiwan Mandarin, formerly the NCCU Corpus of Spoken Mandarin, is a project of language documentation whereby open access to the data is available at no charge for research and teaching purposes. It has been collecting spoken data from daily face-to-face Mandarin conversations in Taiwan since 2006. Written consent was obtained from the participants for the publication of the spoken data. Personal names are represented by a single character or an English letter. A broad transcription of speech is applied with some interactional features such as turn transition, overlaps and code-switching. The spoken data may change from time to time for completeness and consistency.

Part of the corpus data are also available at TalkBank.
http://ca.talkbank.org/access/TaiwanMandarin.html

Fundings for this language documentation project:

The Aim for the Top University and Elite Research Center Development Plan, National Chengchi University (2006 – 2008)
The Humanities Research Center of the National Science Council (2006, 2008)
The Office of Research and Development, National Chengchi University (2008)
Research projects, the Ministry of Science and Technology (2009 – present)

Open Access

Creative Commons License 政大中文口語語料庫 daf7b

This corpus is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format for non-commercial purposes, as long as you give appropriate credit to the source and indicate if changes were made. The images and other third party material in this corpus are included in the article’s Creative Commons license. If your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the corpus holder via contact available on the website.