Yle media evaluation dataset

Beskrivning

This audiovisual dataset contains * audio files, subtitles and ground truth transcripts, speaker diarizations and NER annotations of 16 factual programs in Finnish and Swedish * video files, subtitles, metadata and annotations for 8 factual programs that have been used for demonstration and test purposes in the MeMAD project. The dataset contains 12,7 hours of media in total. --- Yle has released three datasets with an experimental license for a limited amount of time to support the development of language and media related technologies. These datasets were originally created by the MeMAD research and innovation project, a collaboration between media industry members and research groups. The MeMAD project received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. LICENSE INFORMATION: The data is available for research purposes upon specific request from Yle. The party requesting the data has to be located in Finland to gain access to the data (but your other project partners do not need to be). Please see the website at https://developer.yle.fi/en/data/avdata/index.html for more detailed terms and conditions. Requests can be made until the end of year 2022 by submitting the form available via the website.
Visa mer

Publiceringsår

2022

Typ av data

Upphovspersoner

Finnish Broadcasting Company (Yle)

Lauri Saarikoski - Kurator

Tuomas Nolvi - Kurator

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

finska, svenska

Öppen tillgång

Begränsad tillgång

Licens

Other

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata