Tutorial Mining and multimodal learning from complex medical data

06.03.2023 -

TUTORIAL - Mining and multimodal learning from complex medical data

at AIME 2023, June 12-15 2023.

Tutorialists: Myra Spiliopoulou, Panagiotis Papapetrou, Ioanna Miliou and Maria Bamba

Abstract

Recent advances in the areas of artificial intelligence and machine learning and their application to knowledge discovery from medical data sources has been receiving increasing attention over the past several years. At the same time, the adoption of Electronic Health Records (EHRs) together with the availability of patient self-management and empowerment technologies give rise to new forms of health-related data sources of multi-modality and high complexity requiring novel data representation and analytics workflows. Particular challenges that arise include the representation of the involved complex feature spaces, dealing with data sparsity and missingness, exploiting the temporality nature of the data sources, and maintaining good trade-offs between model performance and explainability.
In this tutorial, we focus on data variables of sequential nature related to heath. These variable types include spatial trajectories, panel data from longitudinal studies, time series signals (such as ECGs), event sequences (such as sequences containing EHR events), and mHealth data. We emphasize the importance of mining and knowledge discovery from these data sources in the context of research questions posed by medical researchers and clinicians. We further elaborate on how the integration and fusion of such data sources of heterogeneous nature can result in improved model performance in terms of predictive power, reliability, and scalability. Furthermore, we emphasize the utility and usability of time series forecasting methods when dealing with sequential data variables collected in the hospital setting (e.g., in the Intensive Care Unit). Finally, we discuss and elaborate on the need for explainable machine learning models and their applicability in medical decision making problems that require trustworthy recommendations.

Motivation

The proliferation of medical data and applications has increased the need for extracting useful knowledge that can be effectively used by the healthcare domain experts. The motivation of this tutorial is to address the complexity of medical data with specific focus on their temporal nature. While earlier tutorials in both AIME as well as other related venues such as KDD and ECML/PKDD have explored the application and utility of machine learning on medical data, there has yet been limited focus on the challenges emerging from the sequential and temporal nature of such data, as well as on the need for trust by the medical practitioners.

Format and tentative schedule (3 hours)

Part I: Learning from complex medical sources (PANOS)

Representation learning for medical data sources

Time series summarization
Latent space representation
Pattern-based representation

Multimodal learning

Learning latent spaces across different data spaces
Clustering in multimodal spaces
Patient profiling

Explainability and counterfactuals for deep neural networks in the context of medical data series

LIME and SHAP for temporal data variables
Attention-based explainability
Counterfactual generation for medical sequences
Metrics for assessing explainability and counterfactuals

Part II: Medical time series understanding and prediction (MYRA)

Moving from static to temporal data: prediction after intervention and prediction without
Modeling and monitoring phenotypes over time
Definition of time series over high-dimensional medical assessments
Time series from population-based cohorts, from clinical studies, from experimental data, from observational data, and from mHealth apps
Missingness
Understanding the rules governing a patient’s time series and the matching challenge:
identifying appropriate learning technologies for learning tasks given data

Slides of Part 2, figures removed: here

Part III: Medical time series forecasting (IOANNA)

Statistical methods for forecasting: AR, MA, ARIMA, ARIMAX, SARIMAX
Deep learning models for forecasting: LSTM, GRU
Evaluation metrics and procedures for forecasting
Nowcasting

Slides of Part 3: here

Target audience

This tutorial is targeted to all AIME participants interested in the topic of learning and model understanding on medical data. Our particular focus group consists of junior researchers interested in knowledge discovery from health-related data and on how to convey the extracted models to medical experts. The main prerequisites for the participants concerns basic knowledge within the areas of data mining, machine learning, and databases. The audience is expected to be familiar with standard concepts and methods of machine learning. Such knowledge can be expected from AIME participants, including students.

Our Affiliations

Prof. Panagiotis Papapetrou

Prof. Myra Spiliopoulou

Assistant Prof. Ioanna Miliou

Maria Bampa

Data Science group (DS@SU)

Dept. of Computer and Systems Sciences

Stockholm University, Stockholm, Sweden

Email: panagiotis@dsv.su.se

https://papapetrou.blogs.dsv.su.se

Knowledge Management and Discovery lab (KMD),

Faculty of Computer Science

Otto-von-Guericke-University Magdeburg

PO Box 4120, D-39016 Magdeburg, Germany

Email: myra@ovgu.de

https://www.kmd.ovgu.de/myra.html

Data Science group (DS@SU)

Dept. of Computer and Systems Sciences

Stockholm University, Stockholm, Sweden

Email: ioanna.miliou@dsv.su.se

https://ioannamiliou.blogs.dsv.su.se/

Data Science group (DS@SU)

Dept. of Computer and Systems Sciences

Stockholm University, Stockholm, Sweden

Email: maria.bampa@dsv.su.se
https://mariabampa.blogs.dsv.su.se/

Myra Spiliopoulou is a Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research is on mining dynamic complex data. Her publications are on mining complex streams, mining evolving objects, adapting models to drift and building models that capture drift. Her research has been published in renowned international conferences and journals. She is regularly presenting tutorials on different aspects of complex data mining, and recently on medical mining. She is involved as (senior) reviewer in major conferences on data mining and knowledge discovery. In 2016 and 2019, she served as a PC Chair of the IEEE Int. Symposium on Computer-Based Medical Systems (CBMS). In 2021, she served as a Special Sessions Chair of the IEEE DSAA (Data Science And Analytics) conference and as ECML PKDD 2021 Awards Chair. In 2023, she is a PC Chair at the IEEE CBMS 2023 edition.

Panagiotis Papapetrou is a Professor at the Department of Computer and Systems Sciences of Stockholm University, Sweden. He is also an Adjunct Professor at the Computer Science Department at Aalto University, Finland. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. Panagiotis received his PhD in Computer Science at Boston University in 2009 and his Masters degree at the same university in 2007. He was a postdoctoral researcher at Aalto University during 2009-2012, and a lecturer at Birkbeck University of London, UK, during 2012-2013. He has participated in several national and international research projects, among which a 4-year starting grant funded by the Swedish Research Council. He is serving as Action Editor at the Data Mining and Knowledge Discovery journal and he is a Board Member of the Swedish Artificial Intelligence Society. Panagiotis has been involved in the organization of several Workshops and Tutorials at KDD, ICDM, and ECML/PKDD. Moreover, he has served as the general chair of IDA 2016, PhD consortium co-chair at ICDM 2018, and Workshops co-chair at ICDM 2019.

Ioanna Miliou is an Assistant Professor at the Department of Computer and Systems Sciences of Stockholm University, Sweden. Her research interests lie in the fields of Data Science for Social Good, Nowcasting, and Forecasting. She works with predictive machine learning models that use temporal data, with particular focus on healthcare, epidemics, peace, and sentiment. She was a postdoctoral researcher at Stockholm University, Sweden, during 2021-2022, and at the University of Pisa, Italy during 2019-2021. She received her PhD in Computer Science from the University of Pisa, Italy, in 2018 and a diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece, in 2014. She has participated in several European research projects and has been a PC member of several international conferences in the areas of data mining and machine learning. She has been involved in the organization of the 12th International Conference on Social Informatics (SocInfo 2020) and she served as the PhD Poster Track Chair at the Symposium on Intelligent Data Analysis IDA 2023.
Maria Bampa is a Ph.D. student at the Department of Computer and Systems Sciences of Stockholm University, Sweden. Her main research interest is in multi-source and multi-modal data with a focus on healthcare data, patient, and public health. She works with unsupervised machine learning and deep learning models for multimodal representation learning and clustering. She received her joint Master’s degree in Health Informatics from Karolinska Institute, Sweden, and Stockholm University, Sweden, in 2019, and her Bachelor’s degree in Computer Science from Athens University of Economics and Business, Greece, in 2017. She has participated in a European research project and has been a PC member of international conferences in the areas of data mining and machine learning.