D. Kleftogiannis, P. Kalnis, V.B. Bajic
PLoS One, 8(9):e75505, (2013)
A fundamental problem in bioinformatics is genome assembly.
Next-generation sequencing (NGS) technologies produce large volumes of
fragmented genome reads, which require large amounts of memory to
assemble the complete genome efficiently. With recent improvements in
DNA sequencing technologies, it is expected that the memory footprint
required for the assembly process will increase dramatically and will
emerge as a limiting factor in processing widely available NGS-generated
reads. In this report, we compare current memory-efficient techniques
for genome assembly with respect to quality, memory consumption and
execution time. Our experiments prove that it is possible to generate
draft assemblies of reasonable quality on conventional multi-purpose
computers with very limited available memory by choosing suitable
assembly methods. Our study reveals the minimum memory requirements for
different assembly programs even when data volume exceeds memory
capacity by orders of magnitude. By combining existing methodologies, we
propose two general assembly strategies that can improve short-read
assembly approaches and result in reduction of the memory footprint.
Finally, we discuss the possibility of utilizing cloud infrastructures
for genome assembly and we comment on some findings regarding suitable
computational resources for assembly.