Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Beskrivning

This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users. More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository. License: This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.

Visa mer

Publiceringsår

2020

Typ av data

Upphovspersoner

Tammerfors universitet

Konstantinos Drossos - Upphovsperson

Tuomas Virtanen - Upphovsperson

Unknown organization

Xavier Favory - Upphovsperson

Xavier Serra - Upphovsperson

Zenodo - Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

engelska

Öppen tillgång

Öppet

Licens

Other

Nyckelord

Computer and information sciences

Ämnesord

Temporal täckning

undefined