SQuaD: The Software Quality Dataset (CSV)

Beskrivning

Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limited dimensions, such as code smells, technical debt, or refactoring activity, thereby restricting comprehensive analyses across time and quality dimensions. To address this gap, we present the Software Quality Dataset (SQuaD), a multi-dimensional, time-aware collection of software quality metrics extracted from 450 mature open-source projects across diverse ecosystems, including Apache, Mozilla, FFmpeg, and the Linux kernel. By integrating nine state-of-the-art static analysis tools, i.e., SonarQube, CodeScene, PMD, Understand, CK, JaSoMe, RefactoringMiner, RefactoringMiner++, and PyRef, our dataset unifies over 700 unique metrics at method, class, file, and project levels. Covering a total of 63,586 analyzed project releases, SQuaD also provides version control and issue-tracking histories, software vulnerability data (CVE/CWE), and process metrics proven to enhance Just-In-Time (JIT) defect prediction. The SQuaD enables empirical research on maintainability, technical debt, software evolution, and quality assessment at unprecedented scale. We also outline emerging research directions, including automated dataset updates and cross-project quality modeling to support the continuous evolution of software analytics. Note: This Upload Covers the RAW CSV data. For more details, please refer to the main Zenodo upload (https://doi.org/10.5281/zenodo.17566690) Dataset Size: ~1.77TB
Visa mer

Publiceringsår

2025

Typ av data

Upphovspersoner

Southern Denmark University

Davide Taibi Orcid -palvelun logo - Upphovsperson

Valentina Lenarduzzi Orcid -palvelun logo - Upphovsperson

Mikel Robredo Orcid -palvelun logo - Kurator, Upphovsperson, Utgivare

Matteo Esposito Orcid -palvelun logo - Upphovsperson

University of Milano-Bicocca

Rafael Peñaloza Orcid -palvelun logo - Upphovsperson

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

engelska

Öppen tillgång

Öppet

Licens

Creative Commons Attribution 4.0 International (CC BY 4.0)

Nyckelord

computer science, Software Engineering, technical debt, mining software repositories, software maintainability, sofwtare quality

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata