Threatened Species News Dataset

Beskrivning

This data is part of the research article: Automated retrieval of information on threatened species from online sources using machine learning, Ritwik Kulkarni and Enrico Di Minin, 2021, Methods in Ecology and Evolution Kindly cite this article for the dataset. 1 Considering limited conservation resources, gathering and analyzing information from digital data sources can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human-nature interactions. 2. In this study, we introduce methods to automatically collect information on species threatened by wildlife trade from online news. An end to end pipeline is constructed that begins from searching and downloading news articles about species listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. Additional relevant information is then extracted for each article using a Named Entity Recognition model. 3. The data collected over a one month period included 15,088 articles and focused on 585 species listed in Appendix I of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location, and quantities of traded animals. A regularly updated database is generated by the system, which can be queried and analysed for various research purposes and to inform conservation decision-making. 4. The results demonstrate that natural language processing can be used in an efficient manner to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human-nature interactions in conservation science and practice.
Visa mer

Publiceringsår

2021

Typ av data

Upphovspersoner

Enrico Di Minin - Medarbetare, Rättighetsinnehavare, Kurator, Upphovsperson

Ritwik Kulkarni - Medarbetare, Rättighetsinnehavare, Kurator, Upphovsperson, Utgivare

Projekt

Övriga uppgifter

Vetenskapsområden

Miljövetenskap

Språk

engelska

Öppen tillgång

Öppet

Licens

Creative Commons Attribution NonCommercial ShareAlike 4.0 International (CC BY NC SA 4.0)

Nyckelord

conservation, machine learning, Natural language processing, CITES, Online News, threatened species

Ämnesord

naturskydd

Temporal täckning

undefined

Relaterade till denna forskningsdata