Data Harmonization workshop

Wednesday, May 18, 2022 4:00 PM 16:00
Thursday, May 19, 2022 5:00 PM 17:00

Google Calendar ICS

Date: 18th – 19th May 2022
Time: 14:00-18:00 CEST
Location: Online
Workshop organisers: CINECA, H3ABioNet
Contact: Nicola Mulder, Mamana Mbiyavanga

The ultimate goal of CINECA's vision of a federated cloud-enabled infrastructure making population-scale genomic and biomolecular data accessible across international borders is to enable large-scale federated data analysis responsibly and securely. This will require integrating and harmonizing diverse, large human cohort data using community standards. Data harmonization within and across cohorts adds value to the data for downstream analysis and interpretation and facilitates cross-cohort meta-analysis.
This workshop aims to discuss ways to address common challenges in cohort data harmonization, work towards practical steps to address them, and share best practices. We welcome any cohort with plans for prospective or retrospective data harmonization, enthusiastic about sharing their experience and learning from others' perspectives in cohort data discovery and analysis.
- Data cleaning and curation
- ELSI considerations in merging data
- Data collection standards, ontology terminology and interoperability standards, metadata models
- Data storage standards
- Data harmonization
- Sharing cohort summary data
Applicants are encouraged to check out the CINECA webinar on 31st March 2022 (https://www.cineca-project.eu/webinar/bringing-it-all-together), which highlights some of the relevant standards and applications.
- After this workshop, participants should be able to:
- Do basic data cleaning
- Understand what data standards & ontologies exist for clinical data
- Map their cohort metadata to a data model
- Understand existing approaches to and algorithms for data harmonization
- Prepare summary data from their cohorts
Members of cohort projects who are working on data curation and management. Data managers, curators, bioinformaticians, data scientists.
None, but should be involved in cohort data management or analysis.
This workshop will only provide a foundation for continued learning in data harmonization, with some example applications using synthetic datasets. Future bring our own workshops can be arranged for more hands-on work with your cohort data.

Workshop programme

Time CEST/CAT	Topic	Speaker
18th May 2022
14:00	Welcome and introduction, workshop aims - Video, Slides	Nicky Mulder (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa)
14:15	Data cleaning - Video	Katherine Johnston, Ayton Meintjes (H3Africa/H3ABioNet, University of Cape Town, South Africa)
14:45	Machine learning/text mining tools for cleaning data - Video	Isuru Liyanage (CINECA, EMBL-EBI, UK)
15:15	ELSI considerations in merging data - Video	Melanie Goisauf (CINECA, BBMRI-ERIC, Austria)
15:40-16:00	Break
16:00	Overview of data collection standards, ontology terminology and interoperability - Video	Peter Robinson (Jackson Laboratory, US)
16:45	Metadata: GECKO, IHCC - Video	Carles Garcia (CINECA, EMBL-EBI, UK)
End 18:00	Hands-on work to prepare for loading into Atlas - Video	Carles Garcia (CINECA, EMBL-EBI, UK)
19th May 2022
14:00	Browse newly uploaded data in Atlas - Video	Carles Garcia (CINECA, EMBL-EBI, UK)
14:20	Other considerations -data storage standards - Video	Alexa Heekes (Western Cape Department of Health, University of Cape Town, South Africa)
14:45	Summary from the literature review on data harmonization - Video	Lyndon Zass (H3Africa/H3ABioNet, University of Cape Town, South Africa)
15:00	Example 1: DPUK - Video	Sarah Bauermeister (DPUK, University of Oxford, UK)
15:20	Example 2: H3Africa CVD - Video	Katherine Johnston (H3Africa/H3ABioNet, University of Cape Town, South Africa)
15:45-16:00	Break
16:00	Example 3: PRIMED - Video	Leslie Lange (PRIME consortium, University of Colorado, USA)
16:20	Data harmonization algorithms - Video	Mamana Mbiyavanga (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa)
16:45	Cohort representation (MIACC), how to generate and share summary data - Video	Melanie Courtot (OICR, Canada)
17:15-18:00	BYOD and discussion - Video	All

The CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project aims to develop a federated cloud-enabled infrastructure to make population-scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications. The CINECA consortium will create one of the largest cross-continental implementations of human genetic and phenotypic data federation and interoperability with a focus on common (complex) disease, one of the world’s most significant health burdens. CINECA has assembled a virtual cohort of 1.4M individuals from population, longitudinal and disease studies. Federated analyses will deliver new scientific knowledge, harmonisation strategies and the necessary ELSI framework supporting data exchange across legal jurisdictions enabling federated analyses in the cloud. CINECA will provide a template to achieve virtual longitudinal and disease-specific cohorts of millions of samples, to advance benefits to patients. It will leverage partner membership of standards and infrastructures like the Global Alliance for Global Health, BBMRI, ELIXIR, and EOSC driving the state of the art in standards development, technical implementation and FAIR data.

Posted in WP3, WP6, Workshop
Tagged Standards, GA4GH, Wp3, workshop, FAIR, wp6

Data Harmonization workshop

Workshop programme

Get In Touch

QUICk Links

How do we work

Data Harmonization workshop

Overview

Topics covered

Objectives

Intended Audience

Prerequisites

Workshop limitations

Workshop programme

ABOUT CINECA