Dataset: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data"

Beskrivning

Dataset used in the experiments of the publication: "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data" by Bach et al. File description: cfmid4.tar: MS² spectra simulated using CFM-ID (v4.0.7) for all molecular candidate structures db_layout.png: Visualization of the SQLite database (DB) layout massbank.sqlite.gz: DB containing all needed data to (re-)run the experiments shown in the paper. Please read "DB_README.md" for further details. The database file can be unpacked using gzip. metfrag.tar: MetFrag input files and MS² scores for all candidate sets computed using the MetFrag software. sirius_scores.tar: MS² scores for all candidates and measured spectra using the SIRIUS software. sirius_inputs.tar: Input (ms-files) for the SIRIUS software. DB_README.md: Description of each table in the "massbank.sqlite" SQLite DB. db_processing_scripts.tar: Scripts to re-produce the "massbank.sqlite" and a README.md providing further information on the process. massbank__2020.11__v0.6.1.sqlite: Base SQLite DB from which the "massbank.sqlite" was build up. It was created using the "massbank2db" (v0.6.1) Python package using the MassBank release 2020.11. substructure_fingerprints.tar: Pre-computed substructure counting fingerprints for all candidates related to our experiments. Instructions: The "massbank.sqlite" can be directly used with the Structure Support Vector Machine Model (SSVM) described in the manuscript and implemented in the "ssvm" Python package. If desired, the database can be re-produced using the scripts provided in "db_processing_scripts.tar": Create a directory for all dataDownload and extract the ... Processing scriptsMS² scorer outputs (e.g. metfrag.tar)Pre-computed substructure fingerprints Follow the instructions given in the "README.md" of the "db_processing_scripts.tar"
Visa mer

Publiceringsår

2022

Typ av data

Upphovspersoner

Department of Computer Science

Eric Bach Orcid -palvelun logo - Upphovsperson

Zenodo - Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

Öppen tillgång

Öppet

Licens

Creative Commons Attribution 4.0 International (CC BY 4.0)

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata