Deep Data Mining

Research group We are developing advanced algorithms and models to find patterns, anomalies, and causality. This allows us to predict outcomes and facilitate decision-making even in very large and complex datasets.

The research group of Causal Intelligence and Deep Data Mining was established to develop algorithms and implement prototypes for multi-sources heterogeneous information federation and privacy preservation on multimodal data. Our main research interests include data federation and privacy protection by applying techniques for text mining, natural language processing, machine learning, semantic web, and causal discovery.

Data from several sources

Regarding datatypes to integrate, we consider data from structured (e.g, records in DB), semi-structured (e.g, XML, JSON) and unstructured sources (e.g, news, social media). In a broad view of the core techniques, our group applies technologies of database, data mining, natural language processing, machine learning, and ontology based semantic web technology.

General framework

As application-driven research, we aim to realize general data integration framework to adapt multiple applications (e.g, information retrieval, recommendation systems, online advertisements) and meanwhile acquire the unique characteristics of domain-data to boost the integration accuracy and AI trustworthiness on specialized domains (e.g, healthcare, social networks, robotic heavy machine, demographic, review data).

Research leader

Lili Jiang Associate professor

+46 90 786 58 27

Overview

Participating departments and units at Umeå University

Department of Computing Science

Research area

Computing science

Members

Xuan-Son Vu Visiting researcher

Email

+46 90 786 74 52

Cornelia Käsbohrer Doctoral student

Email

Mattias Brännström Research engineer (on leave), doctoral student

Email

Han Wang Staff scientist

Email

Zhou Zhou Doctoral student

Email

Research projects

Privacy-aware Federated Database Infrastructure Construction for Heterogeneous Data Analysis on Micro-Data

Research area: Computing science

1 November 2018

Research project

Latest publications

How to quickly select good in-context examples in large language models for data-to-text tasks?

Natural Language Processing, Cambridge University Press 2026, Vol. 32, (1) : 1-35

Li, Yulong; Yang, Jiaoyun; Jiang, Lili; et al.

Assessing the fragility of SHAP-based model explanations using counterfactuals

Northern Lights Deep Learning Conference, 6-8 January 2026, UiT The Arctic University, Tromsø, Norway, ML Research Press 2026 : 211-234

Käsbohrer, Cornelia C.; Mair, Sebastian; Jiang, Lili

Mitigating prototype shift: few-shot nested named entity recognition with prototype-attention contrastive learning

Expert systems with applications, Elsevier 2025, Vol. 268

Ming, Hong; Yang, Jiaoyun; Liu, Shuo; et al.

Harnessing high-quality pseudo-labels for robust few-shot nested named entity recognition

Engineering applications of artificial intelligence, Elsevier 2025, Vol. 156

Ming, Hong; Yang, Jiaoyun; Liu, Shuo; et al.

Incomplete multi-view drug recommendation via multi-level representation learning and curriculum learning

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Digital Library 2025 : 4647-4658

Liu, Ning; Tang, Yunsen; Yuan, Haitao; et al.

COMputational Models FOR patienT stratification in urologic cancers: creating robust and trustworthy multimodal AI for health care

2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS), Institute of Electrical and Electronics Engineers (IEEE) 2025 : 121-122

Billis, Antonios; Hering, Alessa; Jiang, Lili; et al.

Synner: synergizing large and small language models for few-shot nested NER

Proceedings of the International Joint Conference on Neural Networks

Ming, Hong; Yang, Jiaoyun; Liu, Shuo; et al.

Modeling fairness in recruitment AI via information flow

AEQUITAS 2025. Fairness and Bias in AI: Proceedings of the 3rd Workshop on Fairness and Bias in AI, co-located with 28th European Conference on Artificial Intelligence (ECAI 2025)

Brännström, Mattias; Xanthopoulou, Themis Dimitra; Jiang, Lili

LPNER: label prompt for few-shot nested named entity recognition

Asian Conference on Machine Learning: 5-8 December 2024, Hanoi, Vietnam, ML Research Press 2024 : 781-796

Yang, Jiaoyun; Zhu, Zhihan; Ming, Hong; et al.

MGLEP: multimodal graph learning for modeling emerging pandemics with big data

Scientific Reports, Springer Nature 2024, Vol. 14, (1)

Tran, Khanh-Tung; Hy, Truong Son; Jiang, Lili; et al.

Few-shot nested named entity recognition

Knowledge-Based Systems, Elsevier 2024, Vol. 293

Ming, Hong; Yang, Jiaoyun; Gui, Fang; et al.

A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants

Processes, MDPI 2024, Vol. 12, (7)

Wang, Dong; Jiang, Lili; Kjellander, Måns; et al.

Data protection and multi-database data-driven models

Future Internet, MDPI 2023, Vol. 15, (3)

Jiang, Lili; Torra, Vicenç

Few-shot named entity recognition via Label-Attention Mechanism

ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence, Association for Computing Machinery (ACM) 2023 : 466-471

Pan, Yan; Yang, Jiaoyun; Ming, Hong; et al.

Impact based fairness framework for socio-technical decision making

Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023)

Brännström, Mattias; Jiang, Lili; Aler Tubella, Andrea; et al.

Timing performance benchmarking of out-of-distribution detection algorithms

Journal of Signal Processing Systems, Springer-Verlag New York 2023, Vol. 95, (12) : 1355-1370

Luan, Siyu; Gu, Zonghua; Saremi, Amin; et al.

dpUGC: learn differentially private representation for user generated contents

Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I, Springer 2023 : 316-331

Vu, Xuan-Son; Tran, Son N.; Jiang, Lili

Self-adaptive privacy concern detection for user-generated content

Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1, Springer Science+Business Media B.V. 2023 : 153-167

Vu, Xuan-Son; Jiang, Lili

Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods

Journal of Environmental Management, Elsevier 2022, Vol. 301

Wang, Dong; Thunéll, Sven; Lindberg, Ulrika; et al.

SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data

Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022, IEEE Computer Society 2022 : 103-107

Tran, Tung Khanh; Vu, Xuan-Son; Jiang, Lili

Proceedings of Umeå's 25^th Student Conference in Computing Science (USCCS 2022)

Report / UMINF, 22.01

Jiang, Lili; Jonsson, Anna; Vanhée, Loïs

Toward Delicate Anomaly Detection of Energy Consumption for Buildings: Enhance the Performance From Two Levels

IEEE Access, IEEE 2022, Vol. 10 : 31649-31659

Wang, Dong; Enlund, Therese; Trygg, Johan; et al.

On the Effects of Data Protection on Multi-database Data-Driven Models

Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings, Springer 2022 : 226-238

Jiang, Lili; Torra, Vicenç

A machine learning framework to improve effluent quality control in wastewater treatment plants

Science of the Total Environment, Elsevier 2021, Vol. 784

Wang, Dong; Thunéll, Sven; Lindberg, Ulrika; et al.

Visual Explanations for DNNs with Contextual Importance

Explainable and Transparent AI and Multi-Agent Systems: Third International Workshop, EXTRAAMAS 2021, Virtual Event, May 3–7, 2021, Revised Selected Papers, Springer 2021 : 83-96

Anjomshoae, Sule; Jiang, Lili; Främling, Kary

Context-based image explanations for deep neural networks

Image and Vision Computing, Elsevier 2021, Vol. 116

Anjomshoae, Sule; Omeiza, Daniel; Jiang, Lili

ICDAR 2021 Competition on Multimodal Emotion Recognition on Comics Scenes

Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, Springer 2021 : 767-782

Nguyen, Nhu-Van; Vu, Xuan-Son; Rigaud, Christophe; et al.

Out-of-Distribution Detection for Deep Neural Networks with Isolation Forest and Local Outlier Factor

IEEE Access, Institute of Electrical and Electronics Engineers (IEEE) 2021, Vol. 9 : 132980-132989

Luan, Siyu; Gu, Zonghua; Freidovich, Leonid B.; et al.

WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics

Mathematics, MDPI 2020, Vol. 8, (11)

Ait-Mlouk, Addi; Vu, Xuan-Son; Jiang, Lili

KBot: a Knowledge graph based chatBot for natural language understanding over linked data

IEEE Access, IEEE 2020, Vol. 8 : 149220-149230

Ait-Mlouk, Addi; Jiang, Lili

Privacy-Preserving Visual Content Tagging using Graph Transformer Networks

Proceedings of the 28th ACM International Conference on Multimedia (MM ’20), ACM Digital Library 2020 : 2299-2307

Vu, Xuan-Son; Le, Duc-Trong; Edlund, Christoffer; et al.

Multimodal Review Generation with Privacy and Fairness Awareness

Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, International Committee on Computational LinguisticsInternational Committee on Computational Linguistics 2020 : 414-425

Vu, Xuan-Son; Nguyen, Thanh-Son; Le, Duc-Trong; et al.

A Web-Based Platform for Mining and Ranking Association Rules

ECIR 2020: Advances in Information Retrieval, Springer 2020 : 443-448

Ait-Mlouk, Addi; Jiang, Lili

ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Down Stream Task

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), Incoma Ltd. 2019 : 1285-1294

Vu, Xuan-Son; Vu, Thanh; Tran, Son N.; et al.

Generic Multilayer Network Data Analysis with the Fusion of Content and Structure

Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019

Vu, Xuan-Son; Santra, Abhishek; Chakravarthy, Sharma; et al.

Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics

Proceedings of The World Wide Web Conference WWW 2019, New York, NY, USA: ACM Digital Library 2019 : 3595-3599

Vu, Xuan-Son; Ait-Mlouk, Addi; Elmroth, Erik; et al.

Microarray Missing Value Imputation: A Regularized Local Learning Method

IEEE/ACM Transactions on Computational Biology & Bioinformatics, IEEE 2019, Vol. 16, (3) : 980-993

Wang, Aiguo; Chen, Ye; An, Ning; et al.

Lexical-semantic resources: yet powerful resources for automatic personality classification

Proceedings of the 9th Global WordNet Conference (GWC 2018), Singapore: Nanyang Technological University (NTU) 2018 : 173-182

Vu, Xuan-Son; Flekova, Lucie; Jiang, Lili; et al.

The cookie recipe: Untangling the use of cookies in the wild

TMA Conference 2017: Proceedings of the 1st Network Traffic Measurement and Analysis Conference

Gonzalez, Roberto; Jiang, Lili; Ahmed, Mohamed; et al.

Entity markup for knowledge base population

Big data analytics: 5th international conference, BDA 2017, Hyderabad, India, December 12-15, 2017, proceedings, Springer 2017 : 71-89

Jiang, Lili

Personality-based Knowledge Extraction for Privacy-preserving Data Analysis

K-CAP 2017: Proceedings of the Knowledge Capture Conference

Vu, Xuan-Son; Jiang, Lili; Brändström, Anders; et al.

A global learning with local preservation method for microarray data imputation

Computers in Biology and Medicine, Elsevier 2016, Vol. 77 : 76-89

Chen, Ye; Wang, Aiguo; Ding, Huitong; et al.

Multimodal Review Generation with Privacy and Fairness Awareness

Vu, Xuan-Son; Nguyen, Thanh-Son; Le, Duc-Trong; et al.

Investigation into causes of boiler failures in waste-to-energy plants with a coupled engineering and data mining solution

Wang, Dong; Jiang, Lili; Kjellander, Måns; et al.

Towards delicate anomaly detection of energy consumption for buildings: enhance the performance from two levels

Wang, Dong; Enlund, Therese; Fors, Amanda; et al.

See more publications at Computing Science

News - Computing Science

AI in focus as ADS Lab at Umeå University brought together 100 experts in cloud research

Published: 2026-07-02

Artificial intelligence, security and energy — key issues at ADS Lab’s annual event.

Alexandre Bartel receives Nordea's Scientific Prize 2026

Published: 2026-06-18

Reveals vulnerabilities in software and develops tools that prevent attacks on digital systems.

Erik Elmroth receives Nordea's Innovation Prize 2026

Published: 2026-06-18

Nordea awards research on cloud technology delivering reliable solutions for critical societal functions.

See more news from Umeå University

Event - Computing Science

September

Disi Lin, Computing Science

Fri

Sep

Disi Lin, Computing Science

09:00 - 13:00 4 September 09:00 - 13:00

AUR.B.330 – Castor.

Defence of doctoral thesis. Structure-Aware Machine Learning for Medical Image Analysis.

Divya Baura, Computing Science

Mon

Sep

Divya Baura, Computing Science

09:15 - 13:00 14 September 09:15 - 13:00

MIT.A.316.

Defence of doctoral thesis. Ensuring Privacy in Virtual Knowledge Graphs.

Anindya Sundar Das, Computing Science

Fri

Sep

Anindya Sundar Das, Computing Science

13:00 - 17:00 18 September 13:00 - 17:00

Hörsal HUM.D.210 - Hummelhonung.

Defence of doctoral thesis. Unmasking the Unknown: Addressing Anomalies for Trustworthy AI.

View more events

Latest update: 2026-03-23

Deep Data Mining

Data from several sources

General framework

Research leader

Overview

Participating departments and units at Umeå University

Research area

Members

Research projects

Latest publications

News - Computing Science

Event - Computing Science