To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

PrintReads timing

Dear GATK Team,

when monitoring the INFO logging of my PrintReads command, I noticed that the last contig to be processed (in my case human chromosome Y) took significantly longer than each of the previous contigs. What is also strange is that the estimated time (last column in the log) was already down to like one minute at the end of the previous contig (chrX); although chrY is relatively short, it took up almost one third of the total run time of the PrintReads command. I repeated the same command three times to make sure it was not a momentary slowdown of our cluster, and it always happened in the last contig (2x chrY and once the chrUn contigs of the human genome). Have you observed this, too? Is this something expected? Or can it be avoided (and thus speed up the execution)?

The following 2 commands were used (this was the step after BaseRecalibrator); I am attaching the logging output files.

         java -Xmx4g -jar GATK/2.4.9/GenomeAnalysisTK.jar -T PrintReads -R hsapiens_coordsort_v37.fa 
         --input_file rmdup.bam -BQSR rmdup.grp -o recal.bam -L hg19_chromosomes.bed

         java -Xmx4g -jar GATK/2.4.9/GenomeAnalysisTK.jar -T PrintReads -R hsapiens_coordsort_v37.fa 
         --input_file rmdup.bam -BQSR rmdup.grp -o recal.bam

Many thanks for your comments and suggestions,


Best Answer


  • CarneiroCarneiro Charlestown, MAMember

    What is the coverage profile of your bam? Could it be that most of your reads are aligned to Y? It is a very repetitive chromosome with lots of potential for alignment nightmare. Could you post the number of reads for each chromosome in your bam? That would help us understand the problem

Sign In or Register to comment.