IEEE ICDM 2017 - Medical Mining Tutorial

ovgu_logo_png SU-logo

Mining Cohorts & Patient Data: Challenges and Solutions for the Pre-Mining, the Mining and the Post-Mining Phases


IEEE ICDM 2017, New Orleans - Tutorial on Mining Cohorts and Patient Data, Monday November 20, 10:30 - 13:00 (NEW DATE)


Tutorialists: Panagiotis Papapetrou (Stockholm) and Myra Spiliopoulou (Magdeburg)


Data mining is intensively used in medicine and healthcare. Electronic Health Records (EHRs) are perceived as big patient data. On them, scientists strive to perform predictions on patients' progress, to understand and predict response to therapy, to detect adverse drug effects, and many other learning tasks. Medical researchers are also interested in learning from cohorts of population-based studies and of experiments. Learning tasks include the identification of disease predictors that can lead to new diagnostic tests and the acquisition of insights on interventions.

In this tutorial, we elaborate on data sources, methods, and case studies in medical mining. Next to the aforementioned conventional data sources, we address the potential of data from mobile devices. We discuss the learning problems that can be solved with those data, we present case studies and investigate the methods needed to prepare and mine those data and to present the results to a medical expert.


PART 1. Introduction

We introduce basic concepts of data analysis for clinical decision support and for clinical research. We distinguish among analyses on electronic health records, on cohorts and on social data.

PART 2. Building cohorts from clinical data and conducting experiments - MYRA

We first elaborate on the workflow of cohort specification, construction and exploration for retrospective studies on hospital data. Then, we shift to prospective studies that require the recruitment of patient and control participants. We discuss experiments on clinical cohorts and present a small number of case studies on diagnostics and treatment.

PARTS 3 + 4. Learning from EHR / supervised and unsupervised – PANAGIOTIS

We discuss the structure of Electronic Health Records in more detail and elaborate on temporal abstractions on such data. Then, we discuss subgroup discovery, disproportionality analysis and event prediction on EHR. We present case studies from two application areas,  adverse drug event prevention and heart failure prediction.

PART 5. Learning from population-based epidemiological studies – MYRA

We turn from hospital data to population-based epidemiological studies. We discuss methods of subgroup discovery and subspace clustering for high-dimensional data of population-based cohorts. Then, we turn to the problem of protocol change during a longitudinal epidemiological study and elaborate on methods that deal with missing variables when learning from high-dimensional timestamped data.

PART 6. Learning from crowdsensing data - MYRA

This small block is on the new area of crowdsensing for patient monitoring and patient empowerment. We focus on the concept “Ecological Momentary Assessments” (EMA) for patient monitoring, elaborate on the potential of mobile technologies for EMA, and discuss basic technologies and challenges for learning on EMA recordings.

PART 7. Closing

We close this tutorial with a brief discussion of the challenges faced by the mining expert when dealing with medical data: (i) finding the data, (ii) seeing the data with the eyes of the medical expert, (iii) preparing the data for learning, (iv) building models and (v) explaining the results, and we provide a brief outline of instruments for each challenge.


  • Slides on parts 2, 5 and 6 here 
  • Slides for Part 3 here 
  • Slides for Part 4 here


Target audience and prerequisites

The tutorial is intended for all ICDM conference participants, and especially young researchers, who are interested in the domain of healthcare and medicine, and on how data mining and machine learning techniques can be applied in these domains. No special background on medical data analysis is required for the participants.

Tutor’s short bio and expertise related to the tutorial

Myra Spiliopoulou is Professor of Business Information Systems at the Otto-von-Guericke-University Magdeburg. Her research is on mining dynamic complex data, with focus on healthcare and social data. She is action editor for IEEE TKDE and DAMI, one of the four Journal Track Chairs for ECML PKDD 2017, jury member of the SIGKDD best PhD award, and Panel Chair of ICDM 2017. In 2016, she was PC Chair of the IEEE Symposium of Computer Based Medical Systems. She has held tutorials on topics of data mining at KDD 2009 and 2015, PAKDD 2013 and 2016 and in many ECML PKDD conferences. In 2018, she serves as PC Chair of the Applied Data Science Track of ACM SIGKDD conference KDD 2018, to be held in London.

Panagiotis Papapetrou is Professor at the Department of Computer and Systems Sciences at Stockholm University and Adjunct Professor at the Computer Science Department at Aalto University. His area of expertise is algorithmic data mining with particular focus on mining and indexing temporal data and healthcare data. Panagiotis received his PhD in Computer Science at Boston University in 2009, was a post-doctoral researcher at Aalto University during 2009-2013, and lecturer at the University of London during 2012-2013. He has participated in several national and international research projects. He is board member of the Swedish AI Society.

Contact info

Prof. Panagiotis Papapetrou
Data Science group
Department of Computer and Systems Sciences
PO Box 7003, 164 07, Stockholm, Sweden
Prof. Myra Spiliopoulou
Research Group on Knowledge Management and Discovery (KMD),
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg,
PO Box 4120, 39016 Magdeburg, Germany


Letzte Änderung: 10.11.2017 - Ansprechpartner: