Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mccmb.belozersky.msu.ru/2015/proceedings/abstracts/181.pdf
Дата изменения: Mon Jun 15 15:35:00 2015
Дата индексирования: Sat Apr 9 23:39:13 2016
Кодировка:
Alignmentfree telomere length estimation from whole genome NGS data
Szymon M. Kielbasa1Jelle Goeman2Hein Putter1 NL Consortium orret Boomsma3 , , Go , , D , Eline Slagboom1Kai Ye4 ,
1

Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Department for Health Evidence, Radboud university medical center, Nijmegen, The Netherlands Netherlands Twin Register, Department of Biological Psychology, VU University Amsterdam, The Genome Institute, Washington University in St. Louis, St Louis, USA

The Netherlands
2 3

Amsterdam, The Netherlands
4

Abstract Telomeres are repetitive structures present at each end of a chromatid. They play role in maintenance of genome integrity. Due to the nature of the chromosome replication process, the telomeres shorten at each replication cycle. Consequently, with lifetime of the organism the average telomere length decreases and it may be used as a marker for organism's biological age . Here we present a method for accurate estimation of telomere lengths from unaligned whole genome sequencing reads. We developed the method based on a dataset provided by The Genome of the Netherlands (GoNL) project which generated whole genome sequencing data for 754 samples of 248 Dutch families. For 381 of the samples telomere length measurements were available. These measurements were obtained without usage of next generation sequencing methods. Our method contains two components: the read classifier and a linear model. The read classifier is a fast function for detection of repetitive sequences (in particular the telomeric motif TTAGGG) in read sequences. We apply this function to all reads of a sample and then we build a table of counts of reads with various repetitive motifs. Next, based on the read counts table and available telomere length measurements we train a linear predictor of telomere length. We demonstrate that the simplest possible predictor, which only bases on frequency of reads with the telomeric motif TTAGGG, displays a strong sequencing batch bias. When frequencies of a few other repetitive motifs are incorporated to the model, its performance significantly improves.


Finally, we compare our predictions with predictions obtained from seqgorithm. The tel al telseq timations show strong effect of sequencing batch. Moreover, we demonstrate that our es method delivers estimations more strongly associated with individuals age.