Albaradei, S., Magana-Mora, A., Thafar, M., Uludag, M., Bajic, VB., Gojobori, T., Essack, M., Jankovic, BR.
Gene: X, (2020)
Background: The accurate identification of the exon/intron
boundaries is critical for the correct annotation of genes with multiple
exons. Donor and acceptor splice sites (SS) demarcate these boundaries.
Therefore, deriving accurate computational models to predict the SS are
useful for functional annotation of genes and genomes, and for finding
alternative SS associated with different diseases. Although various
models have been proposed for the in silico prediction of SS, improving
their accuracy is required for reliable annotation. Moreover, models are
often derived and tested using the same genome, providing no evidence
of broad application, i.e. to other poorly studied genomes. Results:
With this in mind, we developed the Splice2Deep models for SS detection.
Each model is an ensemble of deep convolutional neural networks. We
evaluated the performance of the models based on the ability to detect
SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana,
Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate
that the models efficiently detect SS in other organisms not considered
during the training of the models. Compared to the state-of-the-art
tools, Splice2Deep models achieved significantly reduced average error
rates of 41.97% and 28.51% for acceptor and donor SS, respectively.
Moreover, the Splice2Deep cross-organism validation demonstrates that
models correctly identify conserved genomic elements enabling annotation
of SS in new genomes by choosing the taxonomically closest model.
Conclusions: The results of our study demonstrated that Splice2Deep both
achieved a considerably reduced error rate compared to other
state-of-the-art models and the ability to accurately recognize SS in
other organisms for which the model was not trained, enabling annotation
of poorly studied or newly sequenced genomes. Splice2Deep models are
implemented in Python using Keras API; the models and the data are
available at https://github.com/SomayahAlbaradei/Splice_Deep.git.