AI4C4 – LLM, XAI & Digital Twins
Thursday, 5 June 2025, 16:00-17:30, room 1.G
Session Chair: Akshay Jain (Nokia Bell Labs, FI)
Multi-Cell Outdoor Channel State Information Dataset (MOCSID)
Mohamed El Mehdi Makhlouf (INRIA, France); Maxime Guillaud (Inria, France); Yamil E Vindas Yassine (University of Geneva & Data Science for Digital Health, Switzerland)
We introduce MOCSID, a multi-cell outdoor channel state information dataset of synthetic channel state information (CSI) samples mimicking an outdoor campus scenario, including multiple base stations with partially overlapping coverage areas and pedestrian user mobility. The scenario is characterized by a high density of base stations (10 base stations within a 625 m x 535 m area) and includes a mixture of non-line-of-sight and line-of-sight propagation. MOCSID includes user locations, timestamps, velocity and multipath component information (delays and path coefficients), following realistic pedestrian user mobility patterns generated using the probabilistic roadmap algorithm, and captures key signal propagation characteristics including path loss, shadowing, and multipath effects. Since MOCSID is intended as a reference for the development and validation of channel charting algorithms, we put particular emphasis on the spatial consistency of the synthetic data. With this dataset, we aim to foster progress in channel charting research by facilitating entry into the field and encouraging reproducibility, collaboration, and benchmarking within the community. MOCSID was generated using the NVIDIA Sionna ray tracing tool; the codebase used to generate the dataset as well as the scene description data and user movement patterns are also publicly available, allowing for easy replication, reproduction, or extension.
Understanding the Influence of Data Characteristics on Explainable AI
Julio Corona and Rafael Gonçalves Teixeira (University of Aveiro, Portugal & Instituto de Telecomunicações, Portugal); Mário Antunes (University of Aveiro, Portugal); Rui L Aguiar (University of Aveiro, Portugal & Instituto de Telecomunicações, Portugal)
The deployment of 5G networks has drastically increased the amount and complexity of data, making Machine Learning (ML) models essential for effective network management and optimization. However, understanding these models’ decisions remains challenging. The EU AI Act requires transparency and explanations for high-risk intelligent systems, with similar regulatory frameworks anticipated in other nations, furthering the interest in Explainable AI (XAI) techniques. Despite the availability of several XAI techniques, there is a limited understanding of the root causes of disagreements between different XAI techniques. This paper addresses this gap by investigating the influence of various dataset characteristics on the agreement between XAI techniques within the context of 5G networks. The results reveal that several dataset characteristics, including the number of attributes, the degree of correlation between attributes, and the presence of outliers, significantly influence the consistency of XAI techniques. In contrast, datasets with more classes and higher class entropy exhibit improved agreement across XAI techniques. These findings highlight the pressing need to develop more advanced and reliable explanations to ensure that explanations in such environments are trustworthy and actionable.
Verified TelcoRAG: a Verified Local-Rag for Telecommunication Uses
Shaghayegh Najari, Yasin Salajeghe and Ali Majlesi (Sharif University of Technology, Iran); Mohammad Amin Dehmolaee (Asr Gooyesh Pardaz Company, Iran); Iman Ahmadi and Babak Hossein Khalaj (Sharif University of Technology, Iran)
Large Language Models (LLMs) typically require vast computational resources for training and inference, making their deployment in resource-constrained environments difficult. This limitation has driven researchers to explore alternative strategies, such as using smaller, more efficient models that can still deliver high-quality results. By incorporating Retrieval-Augmented Generation (RAG), these smaller models can compensate for their limited training data and parameters by dynamically retrieving relevant information from external knowledge sources during inference.
We present Verified TelcoRAG, a specialized framework that adapts RAG for smaller language models, specifically Phi2, to address queries related to 3rd Generation Partnership Project (3GPP) telecommunications documents. Verified TelcoRAG builds one existing SpeculativeRAG method by not only generating responses, but also verifying multiple candidate responses from various retrieval sources to ensure greater accuracy. This verification process allows for more reliable and precise answers, particularly in specialized fields such as telecommunications. In this work, we achieved a 3% improvement in the performance of the SpeculativeRAG model, making Verified TelcoRAG a cutting-edge approach in this research domain.
NanoXAI: Explainable AI for Anomaly Detection in Electromagnetic Nano-Networks of CAVs
Stuti Gulati, Priyal Shah, Lakshin Pathak, Chinmay Trivedi and Rajesh Gupta (Nirma University, India); Sudeep Tanwar (Nirma University & Institute of Technology, India)
The authors propose a novel lightweight and interpretable framework to perform anomaly detection in electromagnetic nano-networks in connected autonomous vehicles. The proposed approach adopts TinyML due to its light memory footprint and high accuracy of 98.12%, while the models which are varying from 56.26 KB to 71.72 KB outperform LSTM and GRU counterparts in terms of resource usage. Using state-of-art techniques such as SHAP and LIME to improve explainability and help analyze and interpret the data, the authors decide that latency and error rate should be used for defining anomalies. The proposed framework is trained and tested on a synthetic data set, which covers traffic patterns for nano-scale communication and preprocessed to have acceptable quality. The obtained validation confirms the performance and efficiency achieved by TinyML on resource-limited devices, which are comparable to other approaches. The future work is about integration of real-world dataset, availability of solutions in large nano networks, experimentations with privacy preserving for distributed systems.
AI-Driven Digital Twin for Millimeter-Wave Photonic Fronthaul Links
Francisco Pastor Naranjo (Universitat Politècnica de València & Instituto Universitario de Investigacion en Tecnología Centrada en el Ser Humano, Spain); Jaime Romero Huedo (Polytechnic University of Valencia, Spain); Jose Mora (Universidad Politécnica de Valencia, Spain); Valery Naranjo (Polytechnic University of Valencia, Spain); Beatriz Ortega (ITEAM Research Institute, Spain)
This paper introduces a machine learning-based Digital Twin for Optical Networks (DTON) to model and optimize the performance of millimeter-wave (mmW) photonic fronthaul links. Using an experimentally validated simulation framework, we generate a dataset of 1,100 configurations linking key system parameters to multiple Quality of Transmission (QoT) metrics. Three machine learning models-Gaussian Process Regression (GPR), Artificial Neural Networks (ANNs), and Extreme Gradient Boosting (XGBoost)-are trained to predict these metrics, with ANN and XGBoost demonstrating superior accuracy. Furthermore, Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) are applied to identify optimal system configurations that enhance performance while minimizing power consumption. The results highlight the potential of AI-driven digital twins for real-time network optimization, paving the way for more adaptive and efficient next-generation optical-wireless systems.
NetSimGym: the Gymnasium for Reinforcement Learning in Networking Research
Inho Cha (Korea University, Korea (South)); Seungjae Shin (Electronics and Telecommunications Research Institute, Korea (South)); Kihoon Kim (University of Korea, Korea (South)); Yerin Ahn (Korea University, Korea (South)); Tae-Yeon Kim (ETRI, Korea (South)); Sangheon Pack (Korea University, Korea (South))
Gymnasium (Gym) is a de facto framework for building a target environment of reinforcement learning (RL)-based control and optimization. Therefore, there are several efforts to interwork existing network simulators (e.g., ns-3) with Gym for RL-based networking research. In an effort to provide a new alternative RL playground for networking research, we developed NetSimGym, a wrapper framework that interconnects NetSim, which provides rich and convenient features for network simulations, and Gym. NetSimGym provides a set of core functions to exchange observation, reward, action, and related information between Gym and NetSim through the Protocol Buffers (protobuf)-based messaging interface. This enables NetSim users to easily integrate various RL algorithms with NetSim without needing to implement a data exchange and synchronization mechanism. We open the source code of NetSimGym under the Massachusetts Institute of Technology (MIT) license and expect it to be useful to NetSim users who want to adopt RL in research and development.






















