​CBRC Applications List

​Secuence Analysis Tools

​Knoweldgebases and Detabases

Text Mining and Data Mining Tools


bTSSfinder is a novel tool that predicts putative prompters for five classes of sigma factors in E. coli and in Cyanobacteria. bTSSfinder also classifies cyanobacterial promoters. Comparing to currently available tools, bTSSfinder achieves higher accuracy and has a broad scope


This system makes a thorough analysis of ChIP-Seq peaks and identifies the dominant sequence motif families as potential binding sites of DNA-interacting proteins.

Dragon Desert Masker (DDM)

Tool that can with a very high accuracy demarcate those genomic regions that are unlikely to promote the initiation of transcription. In our machine learning algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around verified TSSs, as well as statistical analyses of these surrounding regions.

If you are using this resource in your research please cite: Schaefer U, Kodzius R, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bajic VB (November 2010) High sensitivity TSS prediction: estimates of locations where TSS cannot occur. PLoS One 5(11): e13934. Epub 2010 Nov 15. doi:10.1371/journal.pone.0013934.

Dragon Motif Finder is a simple ab-initio motif finding tool in DNA sequences. It allows the processing of large sequence sets in a relatively short amount of time on the web. It is heavily used in Fantom5 consortium project for the analysis of promoter sequences.

The Dragon PolyA Spotter is a tool for predicting polyadenylation signals variants in human DNA genomic sequences based on two machine learning algorithms. The tool displays predicted polyA signal variants and their positions in each submitted DNA sequence
Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Kalkatawi M, Rangkuti F, Schramm M, Jankovic BR, Kamau A, Chowdhary R, Archer JA, Bajic VB. Bioinformatics. 2013 Jun 1;29(11):1484. doi: 10.1093/bioinformatics/btt161.

Tool that aims at predicting the DNA binding sites of peroxisome proliferatior-activated receptors (PPARs) with extremely high accuracy.

Dragon TIS Spotter searches for Translation Initiation Sites (TISs) in plant genomic sequences provided in fasta format. The tool analyzes content of the sliding windows of 300 bp of DNA sequence, assuming the TIS is located at 150-152 position of the window counted from the 5prime end. The machine learning prediction algorithm is trained on Arabidopsis genome and tested on genomic sequences of three plant genomes.
Dragon TIS Spotter: an Arabidopsis-derived predictor of translation initiation sites in plants. Magana-Mora A, Ashoor H, Jankovic BR, Kamau A, Awara K, Chowdhary R, Archer JA, Bajic VB. Bioinformatics. 2013 Jan 1;29(1):117-8. doi: 10.1093/bioinformatics/bts638. 

The program is a pipeline for genetic algorithms for optimization of decision tree structures .

Histone Modification in Cancer (HMCan) is Hidden Markov Model based tool that is developed to detect histone modification in cancer ChIP-seq data. It applies three correction steps to the data: copy number correction, GC bias correction and noise level correction. In order to run HMCan, one needs ChIP-seq target alignment file, and control alignment file.
HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data. Ashoor H, Hérault A, Kamoun A, Radvanyi F, Bajic VB, Barillot E, Boeva V. Bioinformatics. 2013 Dec 1;29(23):2979-86. doi: 10.1093/bioinformatics/btt524

Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

LD motif finder

LigandRFs is a random forest-based approach to predict protein-ligand binding sites.

Dimitrios Kleftogiannis, Panos Kalnis and Vladimir B. Bajic


A fundamental problem in bioinformatics is genome assembly. Next-Generation Sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures. Kleftogiannis D, Kalnis P, Bajic VB. PLoS One. 2013 Sep 27;8(9):e75505. doi: 10.1371/journal.pone.0075505.

miRNAVISA is a web-based tool that allows customized interrogation and comparisons of miRNA families for hypotheses generation, and comparison of the per-species chromosomal distribution of miRNA genes in different families.
Exploration of miRNA families for hypotheses generation. Kamanu TK, Radovanovic A, Archer JA, Bajic VB. Sci Rep. 2013 Oct 15;3:2940. doi: 10.1038/srep02940.

A framework for scalable parameter estimation of gene circuit models using structural information.

Genome-wide analysis of alternative TSSs - Improved recognition of industrially important enzymes

The program is able to predict the 12 main variants of human poly(A) motifs, i.e., AATAAA, ATTAAA, AAAAAG, AAGAAA, TATAAA, AATACA, AGTAAA, ACTAAA, GATAAA, CATAAA, AATATA, and AATAGA.

Our method trains a two-round support vector regression model for predicting protein-DNA binding affinity.

Fast and scalable pathogen discovery program with accurate genome relative abundance estimation.


If you are using this resource in your research please cite:


Naeem R, Rashid M, Pain A. (Nov 2012) READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation. Bioinformatics. 2012 Nov 28. [Epub ahead of print]