Back to All Events

Data Harmonization workshop


Date: 18th – 19th May 2022
Time: 14:00-18:00 CEST
Location: Online
Workshop organisers: CINECA, H3ABioNet
Contact: Nicola Mulder, Mamana Mbiyavanga


  • The ultimate goal of CINECA's vision of a federated cloud-enabled infrastructure making population-scale genomic and biomolecular data accessible across international borders is to enable large-scale federated data analysis responsibly and securely. This will require integrating and harmonizing diverse, large human cohort data using community standards. Data harmonization within and across cohorts adds value to the data for downstream analysis and interpretation and facilitates cross-cohort meta-analysis.

    This workshop aims to discuss ways to address common challenges in cohort data harmonization, work towards practical steps to address them, and share best practices. We welcome any cohort with plans for prospective or retrospective data harmonization, enthusiastic about sharing their experience and learning from others' perspectives in cohort data discovery and analysis.

    • Data cleaning and curation

    • ELSI considerations in merging data

    • Data collection standards, ontology terminology and interoperability standards, metadata models

    • Data storage standards

    • Data harmonization

    • Sharing cohort summary data

    Applicants are encouraged to check out the CINECA webinar on 31st March 2022 (https://www.cineca-project.eu/webinar/bringing-it-all-together), which highlights some of the relevant standards and applications.

    • After this workshop, participants should be able to:

    • Do basic data cleaning

    • Understand what data standards & ontologies exist for clinical data

    • Map their cohort metadata to a data model

    • Understand existing approaches to and algorithms for data harmonization

    • Prepare summary data from their cohorts

  • Members of cohort projects who are working on data curation and management. Data managers, curators, bioinformaticians, data scientists.

  • None, but should be involved in cohort data management or analysis. 

  • This workshop will only provide a foundation for continued learning in data harmonization, with some example applications using synthetic datasets. Future bring our own workshops can be arranged for more hands-on work with your cohort data.

Workshop programme

Time CEST/CAT

Topic

Speaker

18th  May 2022

14:00

Welcome and introduction, workshop aims - Video, Slides

Nicky Mulder (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa)

14:15

Data cleaning - Video

Katherine Johnston, Ayton Meintjes (H3Africa/H3ABioNet, University of Cape Town, South Africa)

14:45

Machine learning/text mining tools for cleaning data - Video

Isuru Liyanage (CINECA, EMBL-EBI, UK)

15:15

ELSI considerations in merging data - Video

Melanie Goisauf (CINECA, BBMRI-ERIC, Austria)

15:40-16:00

Break

16:00

Overview of data collection standards, ontology terminology and interoperability - Video

Peter Robinson (Jackson Laboratory, US)

16:45

Metadata: GECKO, IHCC - Video

Carles Garcia (CINECA, EMBL-EBI, UK)

End 18:00

Hands-on  work to prepare for loading into Atlas - Video

Carles Garcia (CINECA, EMBL-EBI, UK)

19th  May 2022

14:00

Browse newly uploaded data in Atlas - Video

Carles Garcia (CINECA, EMBL-EBI, UK)

14:20

Other considerations -data storage standards - Video

Alexa Heekes (Western Cape Department of Health, University of Cape Town, South Africa)

14:45

Summary from the literature review on data harmonization - Video

Lyndon Zass (H3Africa/H3ABioNet, University of Cape Town, South Africa)

15:00

Example 1: DPUK - Video

Sarah Bauermeister (DPUK, University of Oxford, UK)

15:20

Example 2: H3Africa CVD - Video

Katherine Johnston (H3Africa/H3ABioNet, University of Cape Town, South Africa)

15:45-16:00

Break

16:00

Example 3: PRIMED - Video

Leslie Lange (PRIME consortium, University of Colorado, USA)

16:20

Data harmonization algorithms - Video

Mamana Mbiyavanga (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa)

16:45

Cohort representation (MIACC), how to generate and share summary data - Video

Melanie Courtot (OICR, Canada)

17:15-18:00

BYOD and discussion - Video

All

  • The CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project aims to develop a federated cloud-enabled infrastructure to make population-scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications. The CINECA consortium will create one of the largest cross-continental implementations of human genetic and phenotypic data federation and interoperability with a focus on common (complex) disease, one of the world’s most significant health burdens. CINECA has assembled a virtual cohort of 1.4M individuals from population, longitudinal and disease studies. Federated analyses will deliver new scientific knowledge, harmonisation strategies and the necessary ELSI framework supporting data exchange across legal jurisdictions enabling federated analyses in the cloud. CINECA will provide a template to achieve virtual longitudinal and disease-specific cohorts of millions of samples, to advance benefits to patients. It will leverage partner membership of standards and infrastructures like the Global Alliance for Global Health, BBMRI, ELIXIR, and EOSC driving the state of the art in standards development, technical implementation and FAIR data.