DAPlankton: a benchmark dataset for fine-grained domain adaptation
Beskrivning
The DAPlankton dataset consists of over 110k expert-labeled plankton images. The data is divided into two subsets: DAPlankton_LAB and DAPlankton_SEA. DAPlankton_LAB consists of images captured from multiple mono-specific phytoplankton cultures, which were analysed using three different imaging instruments: Imaging FlowCytoBot (IFCB), CytoSense (CS) flow cytometer, and FlowCam (FC) imaging microscope each producing cropped images with one plankton particle in each. An expert further verified the class of each image, ensuring that there was no cross contamination between different cultures. This process resulted in a balanced dataset with negligible label uncertainty. DAPlankton_SEA consists of images captured from water samples collected from the Baltic Sea using two different imaging instruments: IFCB and CS. Each image was manually labeled by an expert. DAPlankton_SEA provides a realistic and more challenging dataset with a large class imbalance and natural intra-class variance.
If you use this dataset in your research, we kindly ask that you reference the following paper:
D. Batrakhanov, T. Eerola, K. Kraft, L. Haraguchi, L. Lensu, S. Suikkanen, M.T. Camarena-Gomez, J. Seppälä H. Kälviäinen, DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation, arXiv, 2024.
**Data composition**
DAPlankton_LAB contains, in total, 47 471 images from 15 phytoplankton species and 3 different domains (imaging instruments). The number of images per class-domain combination varies between 286 and 2618. The list of classes (species) is as follows:
- Aphanizomenon flosaquae
- Apocalathium malmogiense
- Chrysotila roscoffensis
- Diatoma tenuis
- Gymnodinium corollarium
- Kryptoperidium foliaceum
- Levanderina fissa
- Melosira arctica
- Nephroselmis pyriformis
- Peridiniella catenata
- Pseudopedinella sp.
- Rhinomonas nottbecki
- Rhodomonas salina
- Teleaulax acuta
- Tetraselmis sp.
DAPlankton_SEA contains, in total, 64 453 images from 31 plankton classes and 2 different domains. The number of images per class-domain combination varies between 5 and 12 280. The list of classes is as follows:
- Aphanizomenon flosaquae
- Centrales sp
- Chaetoceros sp
- Chaetoceros sp (single)
- Chlorococcales
- Chroococcales
- Ciliata
- Cryptomonadales
- Cryptophyceae Teleaulax
- Cyclotella choctawhatcheeana
- Dinophyceae
- Dinophysis acuminata
- Dolichospermum Anabaenopsis
- Dolichospermum Anabaenopsis (coiled)
- Euglenophyceae
- Eutreptiella sp
- Gymnodiniales
- Gymnodinium like
- Heterocapsa rotundata
- Heterocapsa triquetra
- Heterocyte
- Katablepharis remigera
- Mesodinium rubrum
- Monoraphidium contortum
- Nitzschia paleacea
- Nodularia spumigena
- Oocystis sp
- Pseudopedinella sp.
- Pyramimonas sp.
- Skeletonema marinoi
- Snowella Woronichinia
Visa merPubliceringsår
2024
Typ av data
Upphovspersoner
Jukka Seppälä - Medarbetare
Kaisa Kraft - Medarbetare, Upphovsperson
Lumi Haraguchi - Medarbetare, Upphovsperson
Sanna Suikkanen - Medarbetare
Instituto Español de Oceanografia
Maria Teresa Camarena-Gomez - Medarbetare, Upphovsperson
Daniel Batrakhanov - Medarbetare, Upphovsperson
Heikki Kälviäinen - Medarbetare
Lasse Lensu - Medarbetare
Projekt
Övriga uppgifter
Vetenskapsområden
Data- och informationsvetenskap; Miljövetenskap
Språk
Öppen tillgång
Öppet