Tutorial IEEE BigData 2020

ovgu_logo_png SU-logo


TUTORIAL - Learning from complex medical data

at IEEE BigData 2020, December 10-13, 2020. Take place virtually

Tutorialists: Panagiotis Papapetrou, Myra Spiliopoulou and Jaakko Hollmen (in order of appearance)



Big data analytics and machine learning methods are intensively employed in medicine and healthcare. Electronic Health Records (EHRs) are perceived as big patient data. On them, scientists strive to perform predictions on patients' progress, to understand and predict response to treatment, to detect adverse drug effects and factors for cardiovascular disease, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions.

In this tutorial, we elaborate on data sources, methods, and case studies in medical mining. Next to the aforementioned conventional data sources, we address the potential of data from mobile devices. We discuss the learning problems that can be solved with those data, we present case studies and investigate the methods needed to prepare and mine those data and to present the results to a medical expert. Furthermore, we will emphasize the need for interpretable and explainable models that can inspire trust and facilitate informed decision making. Towards this goal we will discuss and elaborate on actionable models for complex EHR data and their applicability on the interpretation of black-box models, such as deep learning architectures.



The proliferation of medical data and applications has increased the need for extracting useful knowledge that can be effectively used by the healthcare domain experts. Our main focus in this tutorial will be on EHRs, and Cohorts from clinical and mHealth data.

The adoption of EHRs has caused a massive increase in the amount of healthcare documentation. Numerous data sources are available in EHRs, including billing codes of diagnoses, laboratory results, drug prescriptions, and clinical notes. Such data sources can be exploited for developing robust predictive models for solving challenging tasks within the domain of healthcare, such as detecting adverse events. Moreover, of great relevance in the medical and healthcare domain is the topic of interpretability and trust of predictive models and especially black-box models, such as deep learning architectures.

Cohort data typically refer to medical data obtained from a carefully selected set of persons with and without the outcome under observation. The challenging characteristic of these data is that they are small in numbers, have a large set of dimensions and many missing values, possibly not at random. In this tutorial, we will focus on cohorts from clinical data and on cohorts from mHealth data, and we will see methods to predict treatment response before treatment start, to monitor adherence and to deal with missing data and missing features.


Targeted audience

The tutorial is intended for all IEEE Big Data participants, including researchers, academics, practitioners, and especially for young researchers, who are interested on how big data analytics and machine learning can benefit medicine and healthcare.

Content level:  25% beginner, 50% intermediate, 25% advanced

Audience prerequisites:  Participants are expected to have basic knowledge within the areas of data mining, machine learning, probabilistic modeling, and databases. The audience is expected to be familiar with standard concepts and methods, such as classification models, deep learning, density-based clustering, Hidden Markov Models, frequent pattern and rule mining. Such knowledge can be expected from all conference participants, including students.


Tutorial outline
This tutorial will be offered digitally. To ensure a smooth and fast transition among the talks of three speakers (Panos, Myra and Jaakko, in that order), we incorporated the slides of Part I and Part V within each talk.


PART I: Introduction

  1. Definition and examples of patient data, including Electronic Health Records (EHRs), social data, clinical text, data collected in cohort studies

  2. Definition and examples of cohorts, cohorts for clinical studies and population-based studies.

Please find the coverslide and tutorial overview here. The introductory slides on the above topics are incorporated into Parts II, III and IV, in order to ensure a smooth presentation flow in the digital room. 


PART II: Learning from EHR data (Panos)

  1. Temporal abstractions for EHR data

  2. Attention-based deep learning for healthcare event prediction

  3. Actionable models and counterfactual explanations for EHR data

  4. Interpretable ranking and classification of radiography exams

Slides here


PART III: Learning from cohort data (Myra)

  1. Structure of clinical data at screening, during treatment and end-of-treatment visits

  2. Prediction methods and feature space minimization

  3. Structure of mHealth data for monitoring of condition and adherence

  4. Prediction methods in temporal sequences with gaps

Slides here.


PART IV: Case study on neonatal mortality and morbidity (Jaakko)

  1. Problem specification – neonatal mortality and prediction of dangerous conditions among the earlyborn

  2. Machine learning methods for the analysis of time series and clinical variables

  3. Findings

Slides here


PART V: Conclusions and open challenges

  1. The challenge of finding the data

  2. The challenge of seeing with the expert's eyes

  3. The challenge of preparing the data

  4. Challenges of learning

  5. The challenge of explaining the results

Each of Part II, III and IV has its own closing slides; these open issues are discussed there.


Curriculum Vitae of each presenter

  • Myra Spiliopoulou is Professor of Business Information Systems at the Otto-von-Guericke-University Magdeburg. Her research is on mining dynamic complex data, with focus on healthcare and social data. She is action editor for DAMI and PC Chair of the Applied Data Science Track of KDD 2018. In the recent past, she was one of the four Journal Track Chairs for ECML PKDD 2017, Panel Chair of IEEE ICDM 2017 and PC Chair of the IEEE Symposium of Computer Based Medical Systems 2016. She has held tutorials on topics of data mining at KDD 2009 and 2015, PAKDD 2013 and 2016 and in many ECML PKDD conferences.


  • Panagiotis Papapetrou is Professor at the Department of Computer and Systems Sciences at Stockholm University and Adjunct Professor at the Computer Science Department at Aalto University. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. He is action editor for DAMI. He received his PhD in Computer Science at Boston University in 2009, was a post-doctoral researcher at Aalto University during 2009-2013, and lecturer at the University of London during 2012-2013. He has participated in several national and international research projects. He is board member of the Swedish AI Society.



  • Jaakko Hollmen is Senior Lecturer at the Department of Computer and Systems Sciences at Stockholm University. His research interests are in machine learning and data mining, with applications in health and medicine as well as environmental applications in the analysis of data in built and natural environments. He received his D.Sc.(Tech.) degree at Helsinki University of Technology in 2000 and has held various positions at Helsinki University of Technology and Aalto University in Finland since then. In 2019, he joined the Data Science research group at Stockholm University in Sweden. He has organized and led many international conferences in his field. He is a steering committee member of ECMLPKDD. He is also a steering committee member of the IEEE Computer-Based Medical Systems conference series. He is editor of Intelligent Data Analysis journal and a member of the Editorial Board of Data Mining and Knowledge Discovery. He is a Senior Member of IEEE



Letzte Änderung: 12.12.2020 - Ansprechpartner: