PhD Defense I Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds by Othman Soufan

Nov 16 2016 02:00 PM - Nov 16 2016 03:00 PM

Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a con dent conclusion about a speci c treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Computational resources have been playing a signi cant role in this part through a step known as virtual screening. From a data mining perspective, availability of rich data resources is key in training prediction models. Yet, the difficulties imposed by big expansion in data and its dimensionality are inevitable. In this thesis, I address the main challenges that come when data mining techniques are used for virtual screening. In order to achieve an efficient virtual screening using data mining, I start by addressing the problem of feature selection and provide analysis of best ways to describe a chemical compound for an enhanced screening performance. High-throughput screening (HTS) assays data used for virtual screening are characterized by a great class imbalance. To handle this problem of class imbalance, I suggest using a novel algorithm called DRAMOTE to narrow down promising candidate chemicals aimed at interaction with speci c molecular targets before they are experimentally evaluated. Existing works are mostly proposed for small-scale virtual screening based on making use of few thousands of interactions. Thus, I propose enabling large-scale (or big) virtual screening through learning millions of interaction while exploiting any relevant dependency for a better accuracy. A novel solution called DRABAL that incorporates structure learning of a Bayesian Network as a step to model dependency between the HTS assays, is showed to achieve signi cant improvements over existing state-of-the-art approaches.

Biography: Othman Soufan is a PhD Candidate at King Abdullah University of Science and Technology (KAUST). He received his B.Sc. degree in Management Information Systems from King Fahd University of Petroleum and Minerals (KFUPM), in 2010 with first class distinction. He then joined King Abdullah University of Science and Technology (KAUST) and received his M.Sc. degree in Computer Science in 2012. His research interests include developing machine learning and data mining techniques to address challenging problems in computational biology and biomedical applications. His research work has resulted in several peer-reviewed publications in high-quality journals and one filed patent. Othman has received a postdoctoral fellowship award from McGill Univeristy in Canada where he will pursue his interest in data mining and bioinformatics.

More Information:

For more info contact: PhD Candidate Othman Soufan; email: