The Suomi 24 Corpus (2016H2)

Beskrivning

The corpus is available in Kielipankki - the Language Bank of Finland, download: http://urn.fi/urn:nbn:fi:lb-2017021502. License details: http://urn.fi/urn:nbn:fi:lb-20150304151 The corpus contains all the texts available in the Suomi24 API from the discussion forums of the Suomi24 online social networking website from 1.1.2001 to 24.9.2016. The corpus has been tokenized and annotated with morpho-syntactic analysis by FIN-CLARIN at the Department of Modern Languages, University of Helsinki. The tokenized version was created by Aleksi Sahala. Annotation process was then carried out by Jussi Piitulainen (using CSC's Taito cluster). The morpho-syntactic analysis was produced with the Turku Dependency Parser. Researchers who have a user name and a password can download the entire corpus in the VRT format. University students have to apply for access rights at https://lbr.csc.fi/ (sign in with your university credentials) before being able to download the corpus.
Visa mer

Publiceringsår

2019

Typ av data

Upphovspersoner

Aller Media Oy - Upphovsperson

University of Helsinki - Kurator

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

finska

Öppen tillgång

Begränsad tillgång

Licens

CLARIN ACA+NC (Academic, Non Commercial) End User License 1.0

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata