Download PDFOpen PDF in browser

Understand Effective Coverage by Mapped Reads using Genome Repeat Complexity

9 pagesPublished: March 18, 2019


Sequencing depth, which refers to the expected coverage of nucleotides by reads, is computed based on the assumption that reads are synthesized uniformly across chromosomes. In reality, read coverage across genomes is not uniform. Although a coverage of 10x, for example, means a nucleotide is covered 10 times on average, in certain parts of a genome, nucleotides are covered much more or much less. One factor that influences coverage is the ability of a read aligner to align reads to genomes. If a part of a genome is complex, e.g. having many repeats, aligners might have troubles aligning reads to that region, resulting in low coverage.
We introduce a systematic approach to predict the effective coverage of genomes by short-read aligners. The effective coverage of a chromosome is defined as the actual amount of bases covered by reads. We show that the quantity is highly correlated with repeat complexity of genomes. Specifically, we show that the more repeats a genome has, the less it is covered by short reads. We demonstrated this strong correlation with five popular short- read aligners in three species: Homo sapiens, Zea mays, and Glycine max. Additionally, we show that compared to other measure of sequence complexity, repeat complexity is most appropriate. This works makes it possible to predict effective coverage of genomes at a given sequencing depth.

Keyphrases: effective coverage, genome complexity, genomic repeats, short read alignment

In: Oliver Eulenstein, Hisham Al-Mubaid and Qin Ding (editors). Proceedings of 11th International Conference on Bioinformatics and Computational Biology, vol 60, pages 65--73

BibTeX entry
  author    = {Shanshan Gao and Quang Tran and Vinhthuy Phan},
  title     = {Understand Effective Coverage by Mapped Reads using Genome Repeat Complexity},
  booktitle = {Proceedings of 11th International Conference on Bioinformatics and Computational Biology},
  editor    = {Oliver Eulenstein and Hisham Al-Mubaid and Qin Ding},
  series    = {EPiC Series in Computing},
  volume    = {60},
  pages     = {65--73},
  year      = {2019},
  publisher = {EasyChair},
  bibsource = {EasyChair,},
  issn      = {2398-7340},
  url       = {},
  doi       = {10.29007/krbn}}
Download PDFOpen PDF in browser