Lists of Words Corpus (UHLCS)

Beskrivning

The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/data/kielipankki/mrc-uhlcs/general-linguistics/multilingual-data/words/ The lists of words were generated from the corpora of the following languages: * Dutch: 178,430 words, 1,998,881 characters * Finnish: proper names: 714 names, 4,488 characters; general list of words: 264,654 words, 3,171,148 characters * French: 138,257 words, 1,524,757 characters * German: 160,086 words, 2,060,734 characters * Italian: 60,453 words, 561,982 characters * Norwegian: 61,843 words, 589,234 characters * Swedish: 13,328 words, 117,685 characters Type of the documents: words in alphabetic order. Character encoding: ASCII. The lists of words were compiled at the University of Helsinki, Department of General Linguistics. The Lists of Words Corpus is a part of the UHLCS corpus collection. UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com). License details: http://urn.fi/urn:nbn:fi:lb-2015041002 The purpose of the resource use must be outlined in a research plan.
Visa mer

Publiceringsår

2018

Typ av data

Upphovspersoner

User support at CSC - IT Center for Science Ltd. The Language Bank of Finland - Kurator

Pirkko Suihkonen - Rättighetsinnehavare, Upphovsperson

Multiple publishers, check distribution rights holders in original metadata by following its persistent identifier - Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

tyska, finska, franska, italienska, nederländska, norska, svenska

Öppen tillgång

Begränsad tillgång

Licens

CLARIN RES (Restricted) End User License 1.0

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata