Finnish Corpus (Literature) (UHLCS)
Beskrivning
The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access).
Location: /appl/data/kielipankki/mrc-uhlcs/general-linguistics/uralic-lgs/finno-ugric-lgs/baltic-finnic-lgs/finnish
Contents:
1. HKV corpus: consists of samples of the Finnish literature representing various text types. The corpus is documented in the following publication: Auli Hakulinen & Fred Karlsson & Maria Vilkuna Suomen tekstilauseiden piirteitä: kvantitatiivinen tutkimus. Publications, No.6. Department of General Linguistics, University of Helsinki, 1980. The morpho-syntactic encoding is documented in the following publication: Computational morphosyntax: Report on research 1981-84. Publications, No. 13. pp. 115-136. University of Helsinki, Department of General Linguistics, 1985. The file is an encoded version, in which the classes of parts of speech are marked: The corpus is in the ASCII-format. The size of the tagged corpus is 68.425 words and 837.373 characters. The creator of the HKV corpus is Kristiina Jokinen.
2. Le Parole
This electronic language resource was compiled out of several languages spoken in Europe during the international project Le Parole. The corpus includes structure analysis and TEI information in SGML form. It contains subcorpora that have been analysed in a variety of manners. The corpus is in Latin-1 form (ISO 8859-1).
Computational morphosyntax: Report on research 1981-84. Publications, No. 13. pp. 115-136. University of Helsinki, Department of General Linguistics, 1985.
The corpus is in ASCII form. The size of the syntactically coded corpus is 68 425 words and 837 373 characters. The corpus contains 10 149 sentences.
3. Helsinki Region Spoken Language Corpora (1972-1974)
The corpus consists of material collected during the project “Nykysuomen murros” (‘Change of modern Finnish’). The director of the project was Heikki Paunonen and the project, led by the Committee of Humanistic Research in Finland, took place mainly between 1977 and 1980. The description of the project, which is available to researchers, was drawn up by Pirkko Kukkonen. The corpus was transcribed from recorded spoken language material. The size of the corpus is 127 times 30 minutes and it is in ASCII form.
4. Issues of Suomen Kuvalehti published in 1975 and 1976
The corpus includes some issues of the Finnish weekly news magazine Suomen Kuvalehti that were published in 1975 and 1976. The publisher of the magazine Yhtyneet Kuvalehdet Oy gave the material to the department of General Linguistics of the University of Helsinki to be used as research and teaching material. The size of the corpus is 840 762 words and 9 693 042 characters and it is in ASCII form.
5.All issues of Suomen Kuvalehti published in 1987
The corpus includes all the issues of the Finnish weekly news magazine Suomen Kuvalehti that were published in 1987. The publisher of the magazine Yhtyneet Kuvalehdet Oy gave the material to the department of General Linguistics of the University of Helsinki to be used as research and teaching material. The size of the corpus is 1 730 597 words and 12 520 546 characters and it is in ASCII form.
6. Tiede 2000
The corpus includes material from the Finnish science magazine Tiede 2000 that was published in 1990: Tiede 2000, 1990: 1, 39-43. The size of the corpus is 68 067 words and 464 792 characters and it is in ASCII form.
7. WSOY
The corpus includes portions of books published by the Finnish publishing company Werner Söderström Osakeyhtiö (WSOY, Helsinki and Porvoo). The size of the corpus is 979 516 words and 7 086 335 characters and it is in ASCII form.
The Finnish Corpus is a part of the UHLCS corpus collection.
UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com).
License details: http://urn.fi/urn:nbn:fi:lb-20150304124
The purpose of the resource use must be outlined in a research plan.
log
26.11.2018 link http://islrn.org/resources/640-204-024-555-6 removed
9.10.2019: Removed superfluous URN (2014060210)
Visa merPubliceringsår
2018
Typ av data
Upphovspersoner
CSC — IT Center for Science Ltd - Kurator
University of Helsinki - Kurator
Projekt
Övriga uppgifter
Vetenskapsområden
Språkvetenskaper
Språk
finska
Öppen tillgång
Begränsad tillgång