A. Bin Raies, H. Mansour, R. Incitti, V.B.Bajic
PLoS One, 8(10):e77848, (2013)
a number of diseases, certain genes are reported to be strongly
methylated and thus can serve as diagnostic markers in many cases.
Scientific literature in digital form is an important source of
information about methylated genes implicated in particular diseases.
The large volume of the electronic text makes it difficult and
impractical to search for this information manually.
developed a novel text mining methodology based on a new concept of
position weight matrices (PWMs) for text representation and feature
generation. We applied PWMs in conjunction with the document-term matrix
to extract with high accuracy associations between methylated genes and
diseases from free text. The performance results are based on large
manually-classified data. Additionally, we developed a web-tool, DEMGD,
which automates extraction of these associations from free text. DEMGD
presents the extracted associations in summary tables and full reports
in addition to evidence tagging of text with respect to genes, diseases
and methylation words. The methodology we developed in this study can be
applied to similar association extraction problems from free text.
new methodology developed in this study allows for efficient
identification of associations between concepts. Our method applied to
methylated genes in different diseases is implemented as a Web-tool,