Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Read more about it here!
Download the latest Picard release at
GATK version 4.beta.6 is out. See the GATK4 beta page for download and details.

PrintReads timing

Dear GATK Team,

when monitoring the INFO logging of my PrintReads command, I noticed that the last contig to be processed (in my case human chromosome Y) took significantly longer than each of the previous contigs. What is also strange is that the estimated time (last column in the log) was already down to like one minute at the end of the previous contig (chrX); although chrY is relatively short, it took up almost one third of the total run time of the PrintReads command. I repeated the same command three times to make sure it was not a momentary slowdown of our cluster, and it always happened in the last contig (2x chrY and once the chrUn contigs of the human genome). Have you observed this, too? Is this something expected? Or can it be avoided (and thus speed up the execution)?

The following 2 commands were used (this was the step after BaseRecalibrator); I am attaching the logging output files.

         java -Xmx4g -jar GATK/2.4.9/GenomeAnalysisTK.jar -T PrintReads -R hsapiens_coordsort_v37.fa 
         --input_file rmdup.bam -BQSR rmdup.grp -o recal.bam -L hg19_chromosomes.bed

         java -Xmx4g -jar GATK/2.4.9/GenomeAnalysisTK.jar -T PrintReads -R hsapiens_coordsort_v37.fa 
         --input_file rmdup.bam -BQSR rmdup.grp -o recal.bam

Many thanks for your comments and suggestions,


Best Answer


  • CarneiroCarneiro Charlestown, MAMember

    What is the coverage profile of your bam? Could it be that most of your reads are aligned to Y? It is a very repetitive chromosome with lots of potential for alignment nightmare. Could you post the number of reads for each chromosome in your bam? That would help us understand the problem

Sign In or Register to comment.