5G Core GTP-U Attack Dataset
Beskrivning
Introduction
The 5G Core GTP-U Attack Dataset (5G-CGAD) is a dataset developed to address the critical scarcity of publicly available data for security research in 5G networks. While 5G enables unprecedented capabilities such as ultra-reliable low latency communications (URLLC), enhanced mobile broadband (eMBB), and massive machine-type communications (mMTC), it also introduces new vulnerabilities due to its cloud-native, virtualized, and software-driven architecture.
This dataset focuses on the GPRS Tunneling Protocol for the User Plane (GTP-U), which operates on the N3 interface between the gNodeB and the User Plane Function (UPF). As a protocol designed under the assumption of a trusted backhaul, GTP-U lacks authentication and integrity protections, making it a high-value target for adversaries. Attacks exploiting GTP-U can result in denial-of-service, session hijacking, traffic redirection, and privacy violations.
The dataset provides both benign traffic (representing real-world applications such as streaming, browsing, and file downloads) and malicious traffic (five classes of simulated attacks). Data is available in multiple formats (PCAP and CSV with over 80 extracted features), ensuring adaptability for diverse research purposes.
Testbed Design
The dataset was generated using a realistic cloud-native 5G testbed:
Infrastructure: Kubernetes cluster with two Lenovo ThinkStation P3 servers (Ubuntu 22.04).
Core Network: Open-source Open5GS for AMF, SMF, UPF, and other core functions. Radio
Access Network (RAN): UERANSIM to emulate 30 UEs and 3 gNodeBs.
Traffic Capture: Sidecar container inside the UPF pod for real-time packet sniffing.
Monitoring: Prometheus and Grafana for system observability. Analysis: Independent ML-based anomaly detection to validate realism.
This setup ensured scalability, reproducibility, and real-time traffic monitoring.
Benign Traffic Generation
To approximate real-world 5G usage, three benign traffic categories were simulated:
Video Streaming – 20 UEs using iperf3, with randomized throughput (5–50 Mbps) and session durations.
Web Browsing – 10 UEs using curl to fetch content from 12 randomized websites.
File Downloading – Additional browsing users download files of varying sizes.
This diversity captures the variability of session lengths, packet sizes, protocols, and throughput, which is essential for distinguishing attacks from legitimate usage.
Attack Scenarios
The dataset includes five attack categories, each carefully simulated and labeled:
GTP Encapsulation Attack – Nested GTP-U packets injected to exploit UPF vulnerabilities.
Malformed GTP Attack – Variants with invalid headers, corrupted checksums, oversized fields, and unsupported message types.
DDoS Attack (ICMP/UDP Floods) – Attacks from compromised UEs targeting the UPF.
Intra-UPF UE DoS Attack – Malicious UE floods another UE within the same UPF, using SYN floods, UDP floods, ICMP floods, HTTP floods, and fragmentation-based amplification.
GTP-U TEID Brute-Force Attack – Adversary guesses Tunnel Endpoint Identifiers (TEIDs) to discover active sessions or disrupt connectivity.
These attacks were repeated multiple times over several days to ensure diversity and statistical richness.
Data Processing Pipeline
Captured data underwent a structured pipeline:
Packet Capture (PCAP) – Full traffic including GTP-U headers.
GTP-U Header Removal (via STRIPE tool) – Preserving only relevant fields when appropriate.
Flow Generation (via CICFlowMeter) – Conversion into flow-based CSV with 84 statistical features.
Labeling – Mapping to one of six classes: BENIGN, GTP-ENCAPSULATION, GTP-MALFORMED, DDOS, INTRA-UPF-DOS, GTP-BRUTEFORCE.
Feature Selection – Redundancy reduction using Pearson correlation and ANOVA.
Normalization – Features standardized (zero mean, unit variance).
Class Balancing (SMOTE) – To mitigate skew (e.g., large number of Intra-UPF DoS flows vs. fewer TEID brute-force flows).
Visa merPubliceringsår
2025
Typ av data
Upphovspersoner
Suranga Prasad Wengappuli Arachchige - Utgivare, Upphovsperson
Projekt
Övriga uppgifter
Vetenskapsområden
El-, automations- och telekommunikationsteknik, elektronik
Språk
engelska
Öppen tillgång
Öppet