Biomedical Named entity recognition - Pros and cons of rule-based and deep learning methods

The final blog in our series on text-mining is a guest blog written by Shyama Saha, who specialises in Machine Learning/Text Mining at EMBL-EBI. The CINECA project aims to create a text mining tool suite to support extraction of metadata concepts from unstructured textual cohort data and description files. To create a standardised metadata representation CINECA is using Natural language processing (NLP) techniques such as entity recognition, using rule-based tools such as MetaMap, LexMapr, and Zooma. In this blog Shyama discusses the challenges of dictionary and rule-based text-mining tools, especially for entity recognition tasks, and how deep learning methods address these issues.

Text-mining series, CINECA Short Reports, WP3, BlogShyama Saha (EMBL-EBI)December 11, 2020WP3

Assigning standard descriptors to free text

This post is part of a series on a text-mining pipeline being developed by CINECA in Work Package 3. In previous instalments, first, Zooma and Curami pipelines were explained in "Uncovering metadata from semi-structured cohort data". Then, LexMapr was introduced in "LexMapr - A rule-based text-mining tool for ontology term mapping and classification". In this third instalment we are going to explain the normalisation pipeline developed at SIB/HES-SO.

Text-mining series, CINECA Short Reports, WP3, BlogJenny Copara, SIB/HES-SONovember 3, 2020text mining, zooma, metadata, cohort, ontology, WP3

LexMapr - A rule-based text-mining tool for ontology term mapping and classification

The initial focus of LexMapr development has been on providing a text-mining tool to clean up the short free-text biosample metadata that contained inconsistent punctuation, abbreviations and typos, and to map the identified entities to standard terms from ontologies. This blog is the second in a series on text-mining in CINECA. For the previous instalment "Uncovering metadata from semi-structured cohort data" please click here.

Uncovering metadata from semi-structured cohort data

Harmonisation of attributes across different cohorts is very challenging and labour intensive, but critical to leverage the collective potential of the data. The CINECA text mining group aims to provide common tools and methods to extract additional metadata from structured and semi-structured fields in cohorts’ data.

Text-mining series, CINECA Short Reports, WP3, BlogIsuru LiyanageSeptember 15, 2020text mining, metadata, ontology, cohort, WP3

Our Blog

Get In Touch

QUICk Links

How do we work