- A-212 (STCS Seminar Room)
High-throughput short read DNA sequencers are enabling inexpensive sampling of genomes at high coverage. Assembling such short reads to discover hitherto unsequenced organisms is an important challenge in computational biology. The need for memory-intensive graph based models for accurate assembly coupled with the much larger number and higher coverage needed for short reads, is limiting many assemblers to not scale beyond bacterial genomes. In this talk, I will present our parallel short read assembly framework that can assemble large genomes from high coverage sampling of paired short reads with approximate distance constraints. We use bidirected graph models and developed parallel algorithms to carry out memory-intensive phases of the assembly using large distributed memory available on parallel systems. Our framework can handle multiple sized reads and multiple types of distance constraints. I will demonstrate the applicability of this work in genome sequencing projects and comment on future directions.