CINECA Challenge 4: Federated Analysis Interoperability for Research and Healthcare Application

The nature of human data for research is changing to a decentralised model where the vast majority of genetic data will be generated by national scale biobanks and healthcare initiatives. The traditional method of carrying out genetic analysis is to apply for access to the dataset, download the data locally, and run a custom analysis. However, the sheer size of the data, increased security requirements, and jurisdictional restrictions on data export mean that this model is not feasible or scalable. In a federated model, the data analysis migrates to the data, accessing the data and tool via standardised interfaces. Federated analysis is currently possible but requires a series of manual analysis steps performed by each cohort/partner. Consequently, these analyses are only performed by large consortia with very specific scientific questions in mind. Notable examples are large-scale gene expression QTL (e.g. eQTLGen Consortium) or GWAS meta-analyses (e.g. GIANT Consortium) that are almost certain to lead to discovery of novel disease associations, thus justifying the time and resources invested by each cohort.