Xuan-Son Vu

Research interests: trustworthy ML/DL, graph learning in NLP/multimodal data. Co-authored different neural models - e.g., ppRNN (faster version of RNN), MGTN (modular), SGTN (privacy-preseving).

Contact

xuan-son.vu@umu.se

+46 90 786 74 52

Works as

Visiting researcher at Department of Computing Science

MIT-huset, Umeå universitet Umeå universitet, 901 87 Umeå

I am a senior researcher at WASP Media & Language arena, and Founder of DeepTensor AB - a spin-off from CS department, working on securing ML/AI Solutions. I received my Ph.D degree from Umeå University with focus in privacy-guaranteed machine learning with big data. Before that, I got a M.Sc degree in Computer Science, Kyungpook National University in Korea with focus on NLP and Machine Learning. My work has been primarily focused on knowledge – both acquiring knowledge from text, multimodal data, and using structured knowledge to power downstream applications. I am a reviewer for journals/conferences including TheWebConf, ECAI, ICDM, PAKDD, SSR, SC2, COSE (Computer & Security), TPAMI.

My research has been involved around ML/DL solutions for ubiquitous data processing and analysis. Ubiquitous data refers to the various types of data available via IoT applications and user generated contents in daily activities (e.g., multimodal reviews, user activities on mobile devices, etc.). These data are not only scarce, ubiquitous but also located in different devices, locations, and belong to different users. To user generated content research, I have been working on privacy-guaranteed models for protecting user privacy while learning on their data. To data analytics, I have been proposing KaPPA and INFRA, two different analytical frameworks to project personal data for data analytics with new privacy-aware learning methods in the frameworks. Last but not least, different neural models were proposed by us in different research including (but not limited to) ppRNN [6], dpUGC [5], SGTN [4], MG-PRIFAIR [3], MGTN [1], Cformer [7]. These models play important roles in processing ubiquitous data which are not only scarce but also complex in most of the cases.

Collaboration is an important part of my work and I am a board member and publication chair of Vietnam Language and Speech Processing, VLSP, association organizing an International Workshop annually. VLSP encourages research in related areas by providing high quality datasets and letting the research community work on them via data challenges.

Locally, I work with a project in robust machine learning that studies new algorithms and neural methods to enhance robustness of ML based applications in multimodal data (e.g., the combination of textual data, visual data, graph data) tasks. Most recently, our work involves applications in healthcare, food of interest, and security on the edge cloud.

Publications

[1] Modular Graph Transformer Networks for Multi-Label Image Classification. Hoang D. Nguyen, Xuan-Son Vu, Duc-Trong Le, In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI2021)

[2] WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics. Addi Ait-Mlouk, Xuan-Son Vu, Lili Jiang, In: Mathematics (Special Issue "Applied Data Analytics"), 2020, 8(11), 2090; doi:10.3390/math8112090.

[3] Multimodal Review Generation with Privacy and Fairness Awareness. Xuan-Son Vu, Thanh-Son Nguyen, Duc-Trong Le, Lili Jiang, In: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020.

[4] Privacy-Preserving Visual Content Tagging using Graph Transformer Networks. Xuan-Son Vu, Duc-Trong Le, Christoffer Edlund, Lili Jiang, Hoang D. Nguyen, In: Proceedings of the 28th ACM International Conference on Multimedia (ACM MM), 2020.

[5] dpUGC: Learn Differentially Private Representation for User Generated Contents. Xuan-Son Vu, Son N. Tran, Lili Jiang, In: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing, April, 2019.

[6] Improving Recurrent Neural Networks with Predictive Propagation for Sequence Labelling. Son N. Tran, Qing Zhang, Anthony Nguyen, Xuan-Son Vu, Son Ngo, In: Proceedings of the 25th International Conference on Neural Information Processing (ICONIP-2018).

[7] Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling. A Hatefi, Xuan-Son Vu, M Bhuyan, F Drewes, CIKM 2021.

The second visual object tracking segmentation VOTS2024 challenge results

Computer Vision – ECCV 2024 Workshops: ECCV 2024, Cham: Springer 2025 : 357-383

Kristan, Matej; Matas, Jiří; Tokmakov, Pavel; et al.

AURORA-M: open source continual pre-training for multilingual language and code

Proceedings - International Conference on Computational Linguistics, COLING, Association for Computational Linguistics 2025 : 656-678

Nakamura, Taishi; Mishra, Mayank; Tedeschi, Simone; et al.

Reliable-data-split (RDS): maximizing model potential with reinforced selection strategy

Proceedings of Reliable AI workshop at ACML, ML Research Press 2025 : 73-89

Nguyen, Hoang D.; Vu, Xuan-Son; Truong, Quoc-Tuan; et al.

Reliable cultural knowledge preservation in multilingual LLMs through model merging

Proceedings of machine learning research: reliable and trustworthy artificial intelligence, 12 December 2025, multiple, ML Research Press 2025 : 59-66

Nguyen, Hoang Quan; Pham, Nhut Huy; Pahani, Maziyar; et al.

Wave2Graph: integrating spectral features and correlations for graph-based learning in sound waves

AI OPEN, Elsevier 2024, Vol. 5 : 115-125

Hoang, Van-Truong; Tran, Khanh-Tung; Vu, Xuan-Son; et al.

Pseudonymization categories across domain boundaries

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resource Association 2024 : 13303-13314

Szawerna, Maria Irena; Dobnik, Simon; Lindström Tiedemann, Therese; et al.

MGLEP: multimodal graph learning for modeling emerging pandemics with big data

Scientific Reports, Springer Nature 2024, Vol. 14, (1)

Tran, Khanh-Tung; Hy, Truong Son; Jiang, Lili; et al.

NeuProNet: neural profiling networks for sound classification

Neural Computing & Applications, Springer Nature 2024, Vol. 36, (11) : 5873-5887

Tran, Khanh-Tung; Vu, Xuan-Son; Nguyen, Khuong; et al.

Introduction

Proceedings of the workshop on computational approaches to language data pseudonymization (CALD-pseudo 2024), Association for Computational Linguistics 2024 : ii-iii

Volodina, Elena; Alfter, David; Dobnik, Simon; et al.

ADCluster: Adaptive Deep Clustering for unsupervised learning from unlabeled documents

Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), Association for Computational Linguistics 2023 : 68-77

Hatefi, Arezoo; Vu, Xuan-Son; Bhuyan, Monowar H.; et al.

The first visual object tracking segmentation VOTS2023 challenge results

2023 IEEE/CVF International conference on computer vision workshops (ICCVW), Institute of Electrical and Electronics Engineers Inc. 2023 : 1788-1810

Kristan, Matej; Matas, Jiří; Danelljan, Martin; et al.

ViGPTQA - state-of-the-art LLMs for Vietnamese question answering: system overview, core models training, and evaluations

Proceedings of the 2023 conference on empirical methods in natural language processing: industry track, Association for Computational Linguistics (ACL) 2023 : 754-764

Nguyen, Minh-Thuan; Tran, Khanh-Tung; Nguyen, Vincent; et al.

Multimodal machine learning for mental disorder detection: a scoping review

27th international conference on knowledge based and intelligent information and engineering sytems (KES 2023), Elsevier 2023 : 1458-1467

Nguyen, Thuy Trinh; Pham, Viet Hoang-Quoc; Le, Duc-Trong; et al.

Privacy and trust in IoT ecosystems with big data: a survey of perspectives and challenges

Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023, IEEE 2023 : 215-222

Nguyen, Tuan Minh; Vu, Xuan-Son

Personalization for robust voice pathology detection in sound waves

Proceedings of the annual conference of the international speech communication association, INTERSPEECH, International Speech Communication Association 2023 : 1708-1712

Tran, Khanh-Tung; Hoang, Truong; Nguyen, Duy Khuong; et al.

Grandma Karl is 27 years old - research agenda for pseudonymization of research data

Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023, IEEE 2023 : 229-233

Volodina, Elena; Dobnik, Simon; Lindström Tiedemann, Therese; et al.

Self-adaptive privacy concern detection for user-generated content

Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1, Springer Science+Business Media B.V. 2023 : 153-167

Vu, Xuan-Son; Jiang, Lili

MetaVSID: a robust meta-reinforced learning approach for VSI-DDoS detection on the edge

IEEE Transactions on Network and Service Management, IEEE 2023, Vol. 20, (2) : 1625-1643

Vu, Xuan-Son; Ma, Maode; Bhuyan, Monowar H.

dpUGC: learn differentially private representation for user generated contents

Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I, Springer 2023 : 316-331

Vu, Xuan-Son; Tran, Son N.; Jiang, Lili

Optimized and adaptive federated learning for straggler-resilient device selection

2022 International Joint Conference on Neural Networks (IJCNN), IEEE 2022 : 1-9

Banerjee, Sourasekhar; Vu, Xuan-Son; Bhuyan, Monowar H.

Reinforced Transformer Learning for VSI-DDoS Detection in Edge Clouds

IEEE Access, IEEE 2022, Vol. 10 : 94677-94690

Bhutto, Adil B.; Vu, Xuan-Son; Elmroth, Erik; et al.

SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data

Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022, IEEE Computer Society 2022 : 103-107

Tran, Tung Khanh; Vu, Xuan-Son; Jiang, Lili

Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ACM Digital Library 2021 : 3078-3082

Hatefi, Arezoo; Vu, Xuan-Son; Bhuyan, Monowar H.; et al.

Modular Graph Transformer Networks for Multi-Label Image Classification

35th AAAI Conference on Artificial Intelligence, AAAI 2021, 33 Conference on Innovative Applications of Artificial Intelligence and the 11 Symposium on Educational Advances in Artificial Intelligence: Vol. 35 No. 10: AAAI-21 Technical Tracks 10, Association for the Advancement of Artificial Intelligence 2021 : 9092-9100

Nguyen, Hoang D.; Vu, Xuan-Son; Le, Duc-Trong

ICDAR 2021 Competition on Multimodal Emotion Recognition on Comics Scenes

Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, Springer 2021 : 767-782

Nguyen, Nhu-Van; Vu, Xuan-Son; Rigaud, Christophe; et al.

MC-OCR Challenge: Mobile-Captured Image Document Recognition for Vietnamese Receipts

Proceedings - 2021 RIVF International Conference on Computing and Communication Technologies, RIVF 2021, IEEE 2021 : 88-93

Vu, Xuan-Son; Bui, Quang-Anh; Nguyen, Nhu-Van; et al.

WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics

Mathematics, MDPI 2020, Vol. 8, (11)

Ait-Mlouk, Addi; Vu, Xuan-Son; Jiang, Lili

On multi-resident activity recognition in ambient smart-homes

Artificial Intelligence Review, Springer 2020, Vol. 53, (6) : 3929-3945

Tran, Son N.; Nguyen, Dung; Ngo, Tung-Son; et al.

Privacy-guardian: the vital need in machine learning with big data

Report / UMINF, 20.11

Vu, Xuan-Son

Privacy-Preserving Visual Content Tagging using Graph Transformer Networks

Proceedings of the 28th ACM International Conference on Multimedia (MM ’20), ACM Digital Library 2020 : 2299-2307

Vu, Xuan-Son; Le, Duc-Trong; Edlund, Christoffer; et al.

Multimodal Review Generation with Privacy and Fairness Awareness

Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, International Committee on Computational LinguisticsInternational Committee on Computational Linguistics 2020 : 414-425

Vu, Xuan-Son; Nguyen, Thanh-Son; Le, Duc-Trong; et al.

Privacy-awareness in the era of Big Data and machine learning

Report / UMINF, 19.06

Vu, Xuan-Son

Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics

Proceedings of The World Wide Web Conference WWW 2019, New York, NY, USA: ACM Digital Library 2019 : 3595-3599

Vu, Xuan-Son; Ait-Mlouk, Addi; Elmroth, Erik; et al.

Generic Multilayer Network Data Analysis with the Fusion of Content and Structure

Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019

Vu, Xuan-Son; Santra, Abhishek; Chakravarthy, Sharma; et al.

ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Down Stream Task

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), Incoma Ltd. 2019 : 1285-1294

Vu, Xuan-Son; Vu, Thanh; Tran, Son N.; et al.

Improving Recurrent Neural Networks with Predictive Propagation for Sequence Labelling

Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part I, Springer 2018 : 452-462

Tran, Son N.; Zhang, Qing; Nguyen, Anthony; et al.

NIHRIO at SemEval-2018 task 3: a simple and accurate neural network model for irony detection in Twitter

Proceedings of the 12th international workshop on semantic evaluation, New Orleans: Association for Computational Linguistics 2018 : 525-530

Vu, Thanh; Nguyen, Dat Quoc; Vu, Xuan-Son; et al.

Lexical-semantic resources: yet powerful resources for automatic personality classification

Proceedings of the 9th Global WordNet Conference (GWC 2018), Singapore: Nanyang Technological University (NTU) 2018 : 173-182

Vu, Xuan-Son; Flekova, Lucie; Jiang, Lili; et al.

Personality-based Knowledge Extraction for Privacy-preserving Data Analysis

K-CAP 2017: Proceedings of the Knowledge Capture Conference

Vu, Xuan-Son; Jiang, Lili; Brändström, Anders; et al.

METOD: a dataset and baseline for multimodal discovery of event-based news topics

Hatefi, Arezoo; Björklund, Johanna; Vu, Xuan-Son; et al.