Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Beskrivning

This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users. More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository. License: This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.
Visa mer

Publiceringsår

2020

Typ av data

Upphovspersoner

Konstantinos Drossos - Upphovsperson

Tuomas Virtanen - Upphovsperson

Unknown organization

Xavier Favory - Upphovsperson

Xavier Serra - Upphovsperson

Zenodo - Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

engelska

Öppen tillgång

Öppet

Licens

Other

Nyckelord

deep neural network, audio classification, audio representation learning, co-aligned autoencoders, contrastive loss, spectrograms

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata