ERME – Erzya and Moksha Extended Corpora, Korp Version

Beskrivning

This resource is available in the Korp service of the Language Bank of Finland, see Access location. The resource contains the sentences of the original full texts in the ERME corpus in scrambled order. ERME contains predominantly Erzya and Moksha literature. It consists of several media publications from the 19th to the 20th century. ERME was mapped in Saransk in 1997-2004, while in Helsinki it has been mapped since 2004. The most basic format used is XML, with a granularity extending to chapter level. The goal is to create corpora with a granularity extending to word level. For the next version: At sentence level contextual translation will be used (English or Finnish translation), while at word level there will be morphological encoding, corresponding to each context. Preliminary morphological analysis will be carried out using HFST-based transducers, which have been developed in the Giellatekno infrastructure of the University of Tromsø. The grammatical analysis and labeling comply with the practices developed in the Giellatekno infrastructure of the University of Tromsø. These practices are applied in the documentation of several Uralic languages. Amount of processed material: more than a million words. The amount of the processed material is to be increased subsequently. ERME is available at http://korp.csc.fi.
Visa mer

Publiceringsår

2018

Typ av data

Upphovspersoner

University of Helsinki - Kurator

Projekt

Övriga uppgifter

Vetenskapsområden

Språkvetenskaper

Språk

engelska, finska, moksja, erjya

Öppen tillgång

Öppet

Licens

Creative Commons Attribution 4.0 International (CC BY 4.0)

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata