DigiTala: L2 Finnish data from upper secondary schools and university, autumn 2021

Beskrivning

This resource is available via Kielipankki – The Language Bank of Finland. This resource includes speech samples from L2 Finnish speakers, transcripts, human ratings, the learners' responses to post-test surveys and the raters' responses to post-rating surveys. The data was collected in the DigiTala research project (2019–2023) during autumn 2021 from upper secondary school students and university students learning Finnish as a second language. The main goal for DigiTala research project is to develop a digital tool that uses automatic speech recognition and automatic scoring to assess L2 Finnish and Swedish learners' oral skills. The tool also provides automated feedback on learners' speaking performances. The purpose of the digital tool developed in the project is to make assessment of oral language skills possible in high-stakes language tests. Furthermore, students can practice their pronunciation and speech production in foreign languages independently outside the school or without the teacher’s guidance at language classes. During the project, material was collected from upper secondary school students and university students learning Finnish as a second language. In addition to the resource described here and the data from spring 2021 (http://urn.fi/urn:nbn:fi:lb-2023012621), the project made use of the speech material collected in a previous DigiTala project (by Svenska folkskolans vänner in 2015-2017) from upper secondary school students learning Swedish (see http://urn.fi/urn:nbn:fi:lb-2017081502) and speech from Finnish and Swedish tests (see http://urn.fi/urn:nbn:fi:lb-2023012629). Part of the speech material was transcribed during the project. Ratings were organized in four rounds where human raters evaluated a number of speech samples by using the rating criteria developed in the project. The project is funded by the Academy of Finland 2019–2023, and combines expertise in speech and language processing, language education and phonetics at the University of Helsinki (grant number 322619), Aalto University (grant number 322625) and the University of Jyväskylä (grant number 322965). The current project builds on lessons learned during a pilot project (DigiTala 2015–2017). Authors of the resource: Anna von Zansen, Yaroslav Getman, Milla Sneck, Heini Kallio, Ragheb Al-Ghezi, Ekaterina Voskoboinik, Maria Kautonen, Ari Huhta, Mikko Kuronen, Mikko Kurimo, Raili Hildén Size information about this dataset: Finnish Lukio autumn 2021 (freeform + readaloud; including the recordings rejected by the transcriber): 2324 recordings, mean duration 16.69 sec, total duration 10.78h, 162 unique speakers Finnish Aalto autumn 2021 (freeform + readaloud; including the recordings rejected by the transcriber): 1965 recordings, mean duration 11.13 sec, total duration 6.07h, 116 unique speakers The tasks, the surveys and the rating criteria are available via https://zenodo.org/communities/digitala/. For details about the Moodle plugin that was developed by IT students during the project, see von Zansen, A., Alanen, T., Al-Ghezi, R., Erkkilä, J., Harjunpää, T., Heijala, M., Kallio, H. (2022). DigiTala Moodle plugin. https://github.com/aalto-speech/moodle-mod_digitala More information: Kautonen, M. & von Zansen, A. (2020). DigiTala research project: Automatic speech recognition in assessing L2 speaking. Kieli, koulutus ja yhteiskunta, 11(4). https://www.kieliverkosto.fi/fi/journals/kieli-koulutus-ja-yhteiskunta-kesakuu-2020/digitala-research-project-automatic-speech-recognition-in-assessing-l2-speaking
Visa mer

Publiceringsår

2023

Typ av data

Upphovspersoner

Aalto University - Upphovsperson

University of Helsinki - Upphovsperson

Anna von Zansen Orcid -palvelun logo - Kurator

University of Jyväskylä - Upphovsperson

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

finska

Öppen tillgång

Begränsad tillgång

Licens

CLARIN RES (Restricted) End User License 1.0

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata