DigiTala: L2 Finnish data from upper secondary schools, spring 2021

Beskrivning

This resource is available via Kielipankki – The Language Bank of Finland. This resource includes speech samples from L2 Finnish speakers, transcripts, human ratings, the learners' responses to post-test surveys and the raters' responses to post-rating surveys. The data was collected by the DigiTala research project (2019–2023) during spring 2021 from upper secondary school students learning Finnish as a second language. The main goal for DigiTala (2019–2023) research project is to develop a digital tool that uses automatic speech recognition and automatic scoring to assess L2 Finnish and Swedish learners' oral skills. The tool also provides automated feedback on learners' speaking performances. The purpose of the digital tool developed in the project is to make assessment of oral language skills possible in high-stakes language tests. Furthermore, students can practice their pronunciation and speech production in foreign languages independently outside the school or without the teacher’s guidance at language classes. During the project, material was collected from upper secondary school students and university students learning Finnish as a second language. In addition to the resource described here and the data from spring 2021 (http://urn.fi/urn:nbn:fi:lb-2023012621), the project made use of the speech material collected in a previous DigiTala project (by Svenska folkskolans vänner in 2015-2017) from upper secondary school students learning Swedish (see http://urn.fi/urn:nbn:fi:lb-2017081502) and speech from Finnish and Swedish tests (see http://urn.fi/urn:nbn:fi:lb-2023012629). Part of the speech material was transcribed during the project. Ratings were organized in four rounds where human raters evaluated a number of speech samples by using the rating criteria developed in the project. The project is funded by the Academy of Finland 2019–2023, and combines expertise in speech and language processing, language education and phonetics at the University of Helsinki (grant number 322619), Aalto University (grant number 322625) and the University of Jyväskylä (grant number 322965). The current project builds on lessons learned during a pilot project, see DigiTala (2015–2017). Authors of this resource: Anna von Zansen, Yaroslav Getman, Milla Sneck, Heini Kallio, Ragheb Al-Ghezi, Ekaterina Voskoboinik, Maria Kautonen, Ari Huhta, Mikko Kuronen, Mikko Kurimo, Raili Hildén The tasks, the surveys and the rating criteria are available via https://zenodo.org/communities/digitala/. For information about the Moodle plugin that was developed by IT students during the project, see von Zansen, A., Alanen, T., Al-Ghezi, R., Erkkilä, J., Harjunpää, T., Heijala, M., Kallio, H. (2022). DigiTala Moodle plugin. https://github.com/aalto-speech/moodle-mod_digitala More information: Kautonen, M. & von Zansen, A. (2020). DigiTala research project: Automatic speech recognition in assessing L2 speaking. Kieli, koulutus ja yhteiskunta, 11(4). https://www.kieliverkosto.fi/fi/journals/kieli-koulutus-ja-yhteiskunta-kesakuu-2020/digitala-research-project-automatic-speech-recognition-in-assessing-l2-speakingFinnish Lukio spring 2021 (freeform + readaloud) 1055 recordings, mean duration 15.95 sec, total duration 4.67h, 69 unique speakers
Visa mer

Publiceringsår

2023

Typ av data

Upphovspersoner

Aalto University - Upphovsperson

University of Helsinki - Upphovsperson

Anna von Zansen Orcid -palvelun logo - Kurator

University of Jyväskylä - Upphovsperson

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

finska, svenska

Öppen tillgång

Begränsad tillgång

Licens

CLARIN RES (Restricted) End User License 1.0

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata