Dr. Fengzhu Sun is
professor of Computational Biology and Bioinformatics within the
Department of Biological Sciences at USC with joint appointment in the
Department of Mathematics. He has over 20 years of research experiences
in using mathematical, statistical and computational tools to solve
biological problems including protein interaction networks, single
nucleotide polymorphisms and linkage disequilibrium, and metagenomics.
He developed a widely used algorithm for haplotype block partition and
tagSNP selection related to the international HapMap project. He also
developed widely used tools for integrative studies of
genotype-to-phenotype mapping combining information from geneontolgy,
protein interaction networks, pathways, gene expression and SNPs. In
metagenomics, he developed a widely used tool, local similarity analysis
(LSA), for studying associations of operational taxonomic units.
Recently, he developed new statistics for alignment free genome and
metagenome comparison using counts of word patterns. He is an elected
fellow of the American Association for the Advancement of Sciences
(AAAS) and American Statistical Association (ASA), elected member of
International Statistical Institute (ISI), an Astor visiting lecturer in
statistics at Oxford University, and a program chair of RECOMB2013. He
received the USC Provost’s Mellon Mentoring award in 2012. He has
published over 150 papers and has been cited over 6000 times according
to Google Scholar.
New development in alignment-free genome and metagenome comparison
Next generation sequencing (NGS) technologies have generated enormous
amount of shotgun read data and assembly of the reads is challenging,
especially for organisms without reference sequences and metagenomes. We
develop novel alignment-free and assembly-free statistics for genome
and metagenome comparison. The key idea is to remove the background word
counts from the observed counts when comparing genomes and metagenomes.
Markov chains (MC) are usually used to model background molecular
sequences and we develop a new statistical method to estimate the order of MCs based on short read data. The
alignment-free sequence comparison statistics are used to study the
relationships among species, to assign virus to their hosts, and to
classify metagenomes and metatranscriptomes. In all applications, our
novel methods yield results that are consistent with biological
knowledge. Thus, our statistics provide powerful alternative approaches
for genome and metagenome comparison based on NGS short reads.