Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
PrintReads gets stuck
Hi GATK team!
I am trying to do somatic variant calling with RNAseq data (following GATK best practices with paired-end Illumina reads). I have normals and tumors, and my pipeline runs from the fastq to the final recalibrated bam, right before running MuTect2. Of the 30 samples I have, 29 run fine, but for one of them, the last step "PrintReads" just gets stuck almost at the end.
It seems to progress fine at the beginning:
INFO 21:18:07,884 ReadShardBalancer$1 - Done loading BAM index data
INFO 21:18:37,868 ProgressMeter - chr1:632281 200216.0 30.0 s 2.5 m 0.0% 42.5 h 42.5 h
INFO 21:19:37,870 ProgressMeter - chr1:19978481 701190.0 90.0 s 2.1 m 0.6% 4.0 h 4.0 h
INFO 21:20:07,872 ProgressMeter - chr1:37612921 1001193.0 120.0 s 119.0 s 1.2% 2.9 h 2.8 h
INFO 21:20:37,874 ProgressMeter - chr1:61289244 1201197.0 2.5 m 2.1 m 1.9% 2.2 h 2.2 h
INFO 21:21:07,875 ProgressMeter - chr1:91387373 1525198.0 3.0 m 118.0 s 2.8% 106.0 m 103.0 m
and then it gets stuck at a specific position:
INFO 05:02:07,536 ProgressMeter - chr21:8579818 1.31038633E8 7.7 h 3.5 m 86.3% 9.0 h 73.5 m
INFO 05:04:34,643 ProgressMeter - chr21:8579818 1.31038633E8 7.8 h 3.6 m 86.3% 9.0 h 73.9 m
INFO 05:07:07,032 ProgressMeter - chr21:8579818 1.31038633E8 7.8 h 3.6 m 86.3% 9.1 h 74.3 m
INFO 05:09:18,063 ProgressMeter - chr21:8579818 1.31038633E8 7.9 h 3.6 m 86.3% 9.1 h 74.7 m
(the last 208 log entries are exactly like the above (stuck at chr21:8579818; 86.3% progress)
First I thought it might be a memory issue, but I am running now with 1 TB of RAM, and it just runs out of time (max walltime is 24 h in my server, though I doubt allowing more time would let it finish?).
Thing is, this is not the biggest sample (in terms of reads, or file size), and all others run fine in less than 12 h.
I am using GenomeAnalysisTK-3.8-0, and this is my command line for that part:
INFO 21:18:06,716 HelpFormatter - Program Args: -T PrintReads -R /reference/GRCh38.p7.genome.fa -I /RNAseq_alignments/sample_dir/sample_rg_added_sorted.marked_duplicates.split.bam -BQSR /RNAseq_alignments/sample_dir/sample_rg_added_sorted.marked_duplicates.split.bam-realigned_recal_data.table -o /RNAseq_alignments/sample_dir/sample_recal_reads.bam
Any help, please?