Models for MeMAD language identification pipeline

Beskrivning

A collection of models for MeMAD spoken language identification pipeline. Zip contains four models: An xvector embedding model trained on 67 languages using the lidbox toolkit.A scikit-learn StandardScaler for standardizing the embedding model output before Naîve Bayes classification.A probabilistic linear discriminant analysis (PLDA) model for reducing the dimensions of the embedding vectors.A scikit-learn Naïve Bayes model for classifying embedding vectors to six categories: de, en, fi, fr, sv, x-nolang
Visa mer

Publiceringsår

2021

Typ av data

Upphovspersoner

Department of Signal Processing and Acoustics

Anja Virkkunen - Upphovsperson

Matias Lindgren - Upphovsperson

Zenodo - Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

Öppen tillgång

Öppet

Licens

Creative Commons Attribution 4.0 International (CC BY 4.0)

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata