Dataset supporting the Machine Learning-Assisted Clustering of Amino Acids project

Beskrivning

In the Machine Learning-Assisted Clustering of Amino Acids study, a model of the interactions observed in the peptide-AuNC interface was developed. The completion of this work was accompanied by the generation and production of data, and the description of which can be found in this metadata repository. In order to perform the clustering of molecular structures related to the interaction or non-interaction, the SMILES strings of amino acids/peptides were generated using GenPep. The datasets pertaining to amino acids and peptides, SMILES strings and codes with two and three amino acids, in conjunction with the structures with protonation state related to pH 7, are deposited in the folder entitled "1.data_generation". In order to validate the model, molecular dynamics simulations using software GROMACS were performed. The geometrical clustering of each simulation, containing the coordinates of the structures with higher population, in order, is deposited in the folder entitled "2.geometrical_clusters". In addition, to validate the model, structures were obtained through PBE-optimization with DFT calculations using software GPAW. The coordinates are deposited in a folder entitled "3.DFT_coordinates". A comprehensive description of the methodologies employed in the generation and production of these data is provided in the forthcoming publication. The dataset is available at: https://nextcloud.jyu.fi/index.php/s/XFdkXR9njeboCSW
Visa mer

Publiceringsår

2025

Typ av data

Upphovspersoner

Kemian laitos

de Souza Ferrari, Brenda Orcid -palvelun logo - Rättighetsinnehavare, Upphovsperson

Fysiikan laitos

Fallah, Zohreh Orcid -palvelun logo - Upphovsperson

Häkkinen, Hannu - Upphovsperson

Khatun, Maya Orcid -palvelun logo - Upphovsperson

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap; Fysik; Kemi; Biokemi, cell- och molekylärbiologi

Språk

engelska

Öppen tillgång

Embargo

Licens

Creative Commons Attribution 4.0 International (CC BY 4.0)

Nyckelord

Molecular Dynamics, density functional theory, machine learning, nanoclusters

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata