CINECA
Home
History Partners Cohorts Related projects
Governance Work packages Deliverables EC Working Groups
News & Events Webinars Short videos Blog
Scientific impact Publications
Learning Pathway Synthetic Cohort Datasets Federated Analysis Platform
| Events | Contacts
Common Infrastructure for National Cohorts in Europe, Canada, and Africa
Home About CINECA History Partners Cohorts Related projects How we work Governance Work packages Deliverables EC Working Groups Dissemination News & Events Webinars Short videos Blog Impact Scientific impact Publications Virtual Platform Learning Pathway Synthetic Cohort Datasets Federated Analysis Platform
CINECA
Common Infrastructure for National Cohorts in Europe, Canada, and Africa

Powering up data discovery and access using the Data Use Ontology

This month’s blog was written by Melanie Courtot, metadata standards coordinator at EMBL-EBI and co-Work Package Lead of CINECA WP3 - Cohort Level Metadata Representation. This blog is the fourth in our Global Alliance for Genomics and Health (GA4GH) standards series, presenting an overview of how GA4GH standards are being developed and implemented by CINECA. In our April post about Passport, Mikael from CINECA WP2 explained the importance of controlled-access to protect sensitive data, federated data access in the cloud and how Passport enables researchers to authenticate - prove they are who they say they are.


An important complementary aspect of authenticating users for data access is proving that a researcher is authorised to access the data - given who they are, their institution, their project… they should be allowed to access the data.


CHALLENGES IN DATA ACCESS

Datasets available to researchers have varying ethical or legal conditions for secondary data use  - derived from informed consent processes or other authorisations (e.g., laws, policies or agreements). For example, some datasets are available only for non-commercial organisations, preventing access by pharmaceutical companies, or consented only for research about specific diseases. Ethical requirements may mandate that ancestry research must not be performed, and legal frameworks may forbid transferring data to another country.

Every institution uses unique language in their informed consent forms to describe the secondary use restrictions and conditions on their datasets. This means that each data access request must be manually evaluated against the data use limitations that specifies how the dataset can be used. Consequently, Data Access Committees typically respond to such requests in two to six weeks, considerably slowing down the pace of research. 

To address this challenge, we have developed the Data Use Ontology (DUO), a hierarchical vocabulary of terms representing permissions associated with secondary data use. DUO allows to annotate datasets consistently and unambiguously; each DUO term is developed by the community, and includes human readable metadata such as a definition, example of usage etc.  The DUO hierarchy has been improved based on user feedback in the February 2021 release, and reflects the functional split between permissions and additional modifiers that further specify those permissions:

Pictures credit Stephanie Li, GA4GH.

This allows DUO to provide an unambiguous, shared understanding of data use conditions. DUO terms are encoded in the machine readable W3C standard OWL Web Ontology Language, and follow Open Biological and Biomedical Ontologies development principles. A researcher can query the European Genome-phenome Archive (EGA) at EMBL’s European Bioinformatics Institute and the Centre for Genomic Regulation, or any database that has implemented DUO, for discovery of datasets annotated with DUO terms, to only retrieve data that matches their intended use.

DUO can also be implemented for automated matching to allow authenticated users to gain access to datasets compliant with their research. For example, an industry researcher working on cancer would be matched to any dataset that is allowed for commercial use and for cancer research and offered the opportunity to fetch them automatically using a DUO-powered algorithm.

DUO STEP-BY-STEP

At data deposition time, the data depositor provides their datasets annotated with DUO terms. These can originate from consent forms natively when they follow the Machine readable consent guidance, or can be derived from it by the depositor. 

Pictures credit Stephanie Li, GA4GH.

At data request time, a scientist encodes their research purpose using DUO terms. The Data Access Committee can rely on the DUO matching algorithm or make a manual determination of access permissions.

Pictures credit Stephanie Li, GA4GH.

The data access committee approval, if granted, is shared with the data repository, and the dataset is made available to the requestor.

DUO HAS BEEN IMPLEMENTED WORLDWIDE

CINECA WP3 has implemented GA4GH DUO to annotate cohort data from H3Africa. Based on their feedback, several improvements have been made to the ontology, for example creating new terms to differentiate between non-commercial entities accessing the data, and entities accessing the data for non-commercial purposes. 

Pictures credit Stephanie Li, GA4GH.

As of April 2021, DUO has been used in over 200,000 annotations worldwide, and its community of users keeps growing. In CINECA, the CHILD cohort study is in the process of reviewing their consent forms to annotate their datasets using DUO terms. Further contributions to the DUO standard are encouraged on the DUO issue tracker. 

DUO is distributed under CC-BY, and the latest released version of the DUO is always available at http://purl.obolibrary.org/obo/duo.owl. DUO can be browsed online using the EMBL-EBI Ontology Lookup Service. Documentation is available from the DUO Github repository. 

GA4GH standards series, CINECA Short Reports, Blog, WP3Melanie CourtotMay 4, 2021WP3, DUO, GA4GH, Data Access, Federated Data Sharing, metadata, Passport, authorisation, ontology, consent, data use ontology
Twitter LinkedIn0 0 Likes
Previous

FAIRplus FAIRification wizard

CINECA Short Videos, WP3, WP6May 17, 2021WP3, WP6, FAIR
Next

Passport is the glue between the researcher, data and computing

GA4GH standards series, CINECA Short Reports, Blog, WP2Mikael LindenApril 2, 2021GA4GH, Federated Data Sharing, Access, WP2, permission, DAC, AAI, cloud, Authorisation, authorisation

Get In Touch

Email: info@cineca-project.eu

Keep up to date with all our latest news and events on:

QUICk Links

History
Partners
Cohorts
Scientific Impact
Events

How do we work

Work Packages
Related Projects
Synthetic Datasets
Webinars
Short training Videos

langfr-225px-Flag_of_Europe.svg_-e1550656025583 (1).png

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825775.

Canadian Institute of Health Research.png

This project has received funding from the Canadian Institute of Health Research under CIHR grant number # 404896

info@cineca-project.eu
Hours

© Copyright 2022 CINECA project.