Learning gene functions based on expression: Know what you ask for
11:35 - 12:10
Building 9 - Lecture hall 2
Recent updates in sequencing technology have made it possible to start looking at pangenomics questions comparing amongst others gene/presence absence variations in individuals or populations. However when doing so one frequently encounters that many genes lack a functional ontology. Many of these “unknown genes” exist in multiple species, but have not been characterized in any.
Typical approaches to predict the function of those genes comprise phylogenetical profiling and/or to analyze their expression behavior across a large set of experiments. The latter analysis had been perfected for microarray data taking into account sample selection to find genes associated with specific processes, but performance of the same method using modern RNASeq data might have lagged behind. In any case the application of adaptive performance tuning of RNASeq data was hindered by computation intensive procedures for RNASeq data analysis. This issues has been addressed by novel pseudomapping approaches such as those implemented in kallisto/salmon allowing individuals to analyze large expression data matrices. We present data showing that the careful selection of training data positively affects the performance of the outcome and show the effect of novel and fast RNASeq analysis pipelines on gene function predictions using biomedical ontologies.