Nov 04 2019 03:00 PM
-
Nov 04 2019 05:00 PM
Abstract:
Diseases take a central role in biomedical research; many studies aim to enable
access to disease information, by designing named entity recognition models to
make use of the available information. Disease recognition is a problem that
has been tackled by various approaches of which the most famous are the lexical
and supervised approaches. However, the aforementioned approaches have many
drawbacks as their performance is affected by the amount of human-annotated
data set available. Moreover, lexical approaches cannot distinguish between
real mentions of diseases and mentions of other entities that share the same
name or acronym. The challenge of this project is to find a model that can
combine the strengths of lexical and supervised approaches, to design a named
entity recognizer. We demonstrate that our model can accurately identify
disease name mentions in text, by using word embedding to capture context
information of each mention, which enables the model to distinguish if it is a
real disease mention or not. We evaluate our model using a gold standard data
set which showed high precision of 84%. Finally, we compare the performance of
our model to different statistical named entity recognition models, and the
results show that our model outperforms the unsupervised lexical approaches.