A Representative User-centric Dataset of 10 Million GitHub Developers

Beskrivning

Using GitHub APIs, we construct an unbiased dataset of over 10 million GitHub users. The data was collected between Jul. 20 and Aug. 27, 2018, covering 10,649,574 users, 118,602,740 commits, and 20,999,258 repositories. Each data entry is stored in JSON format, representing one GitHub user, and containing the descriptive information in the user’s profile page, the information of her commit activities and created/forked public repositories.
Visa mer

Publiceringsår

2018

Typ av data

Upphovspersoner

Department of Communications and Networking

Jiayun Zhang - Upphovsperson

Pan Hui - Upphovsperson

Qingyuan Gong - Upphovsperson

Xiang Li - Upphovsperson

Xiaoming Fu - Upphovsperson

Xin Wang - Upphovsperson

Yang Chen - Upphovsperson

Yu Xiao Orcid -palvelun logo - Medarbetare

Fudan University - Medarbetare

Harvard Dataverse - Utgivare

University of Göttingen - Medarbetare

University of Helsinki - Medarbetare

Projekt

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Språk

Öppen tillgång

Öppet

Licens

Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Nyckelord

Ämnesord

Temporal täckning

undefined

Relaterade till denna forskningsdata