CINECA Virtual Platform

Federated Data Analysis

Large-scale genomic datasets involving thousands of human participants are now being generated through both healthcare systems and research consortia. Therefore, there is an increasing need to facilitate the cross cohort analysis of these data to discover new disease associations. However, increasing volume of these data, jurisdictional restrictions on data export, and associated security protocols means the traditional model of moving the data to the analysis is no longer feasible.

To overcome these issues, CINECA has built a federated analysis framework for a set of cross cohort research applications by using standardised tools and practises to maximise compatibility with existing systems, as well as to minimise the amount of custom development required.

CINECA has completed technical demonstrators 2 types of use cases, federated eQTL (Quantitative Trait Loci) and PRS (Polygenic Risk Score) analyses, illustrating how datasets with harmonised metadata (Data harmonization) can be discovered via GA4GH Beacon (Federated data discovery), how these datasets can be accessed using LifeScience Login AAI (Data access), and how these datasets can be analysed using the portable and modular workflows developed by CINECA.

  • CINECA used eQTL studies, which provide information of genomic loci that explain variation in expression levels of mRNAs, as a Use Case to develop workflows to quantify and normalise molecular traits. 6 eQTLmodular workflows were created, using Nextflow as language, packed into publicly available Docker/Singularity containers, and with small test datasets that simplify deployment.

    These workflows allow users to perform the same analyses in infrastructures located in multiple jurisdictions.

    Links

  • Description text goes here