AI & HPC Bioinformatics Solutions

Context 

The current volume of the data generated through high-throughput experiments and the volume of data publicly available makes it imperative to have bioinformatics applications that are time-efficient. This requires development of solutions that will utilise efficiently highly parallelised architecture of HPC systems. In this theme we are focusing on developing such methodologies and solutions for NGS & omics data analysis, function annotation of transcripts, transcription regulation networks, various DNA and RNA signal recognitions, and efficient indexing of information in bioinformatics resources for use on HPC systems. This project contributes to the CBRC CCF Flagship Program.

​Applications of AI to Bioinformatics and Computational Biology 

In this project we work on developing AI methods and algorithms suitable for learning and modeling from data. These include feature selection and various machine learning approaches, including deep learning, suitable for detection of signals in DNA and RNA and protein sequences, secondary structures of nucleotide and protein sequences, etc. Some parts of this project contribute to the CBRC CCF Program. Some of the developed resources are: 

  • Dragon PolyA Spotter 

  • Dragon TIS Spotter 

  • DEMGD​

​Annotation of transcripts' functions

This project aims at developing platforms for annotation of function of any transcript based on the characteristics of the transcript activation regulatory network. The focus is on miRNA and lncRNA. One of the resources developed that contains rich annotation of over 30,000 non-coding RNA transcripts is shown below:

  • FARNA database

  • miRNAVISA

This project contribute to the CBRC CCF Program.

Transcription Regulation Networks

The project focuses on development of new methods for modeling, including deep learning models, interactions between transcription factors (TFs) and DNA, new TF binding sites (TFBSs) prediction models, annotation of promoters and other gene regulatory regions. It aims at determining the most complete set of associations between transcription co-factors, TFs, promoters and TFBSs, and associated transcripts. The project also encompasses microRNA, long non-coding RNA, their regulation and their regulatory effects. Some parts of this project contribute to the CBRC CCF Program. Some of the tools and resources developed are:

  • TcoF database

  • HOCOMOCO database

  • TF-TcoF predictor

  • DENdb

Related Publications