The Suomi24 Sentences Corpus 2001-2017, Korp version 1.3

Beskrivning

This resource is available via Korp in Kielipankki – the Language Bank of Finland. The corpus contains all the discussion forums of the Suomi24 online social networking website from 1st January 2001 to 31st December 2017 available in the Suomi24 API. Researchers can download the entire corpus (see http://urn.fi/urn:nbn:fi:lb-2020021801). Updates: 2025-04-11: For version 1.3 the data has been updated with annotations of names recognized with FiNER 1.6 and languages of sentences identified with HeLI-OTS 2.0. 2021-04-21: In the updated version 1.2, some new annotations were inserted. Each sentence in the corpus now includes sentiment annotation (polarity: positive, negative, or neutral). The polarity information was obtained with an automatic classifier, trained with the FinnSentiment data (http://urn.fi/urn:nbn:fi:lb-2020111001; see also Lindén, Jauhiainen & Hardwick, 2020). 2020-02-20: The updated version 1.1 includes some minor corrections: in the previous version, the nicknames of writers were missing in some posts from the years 2009–2012 and 2014, and the characters ', " and & that occurred in some nicknames were incorrectly displayed as ', " or &. Moreover, the part "2017H2" in the previous title of the corpus was replaced by the years "2001–2017".
Visa mer

Publiceringsår

2020

Typ av data

Upphovspersoner

City Digital Group - Rättighetsinnehavare, Upphovsperson

User support FIN-CLARIN - Kurator

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

finska

Öppen tillgång

Begränsad tillgång

Licens

Creative Commons Attribution NonCommercial 2.0 Generic (CC BY NC 2.0

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata