Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

DP is not reporting filtered read numbers

sezimmersezimmer Bronx, New YorkMember

Hi Everyone,

I am using GATK version 3.3 and ran haplotype caller with -mmq 30 and -mbq 20. The output of one of the positions in the VCF is below.

chr13 58264978 . C . . . GT:AD:DP:GQ:PL 0/0:22,1:23:54:0,54,655

I used IGV to validate this position. IGV reports, like GATK does that there are 23 total reads covering that position. Although, there are also 4 reads that have a phred score of less than 20 at the position above. The DP statistic seems to be recording the number of unfiltered reads, but not the filtered ones. Do you know why this is?

Thank you.

Best,
Sam

Tagged:

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Sam, I'm not sure I understand what you're saying. Are you saying that the 4 reads with lower values are being counted toward the total depth, or that they're not?

  • sezimmersezimmer Bronx, New YorkMember

    Thanks for the quick reply. I am saying that the 4 reads with lower values are being counted toward the total depth. However, I don't believe they should be counted. The DP is 23. However, due to the 4 low quality reads I believe the actual value of the DP should be 19 instead. Is that better?

    Thanks.

    Sam

  • sezimmersezimmer Bronx, New YorkMember

    Thanks for the clarification. I think the GATK page: https://broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_Coverage.php actually says that at the sample level the DP value is the count of reads that passed the caller's internal quality control metrics. At the site level the DP value is the unfiltered depth over all samples. This may be worth clarifying.

    I also have another related question if you don't mind. Since I actually would really like to get the number of filtered reads at a specific position, I am going to parse the bam file that is generated by the "bamout' parameter in haplotype caller. However, when I did this, the bam file was empty, but the VCF file is definitely not. There are a 1,011 variants found in the VCF file. Judging by the output file, it seems like the program never actually finished running and the bam file was not produced. Do you know why that might be? I am using GATK version 3.3. Below is the output.

    Thank you very much for your help.

    Best,
    Sam

    java -jar -Xmx5g /public/apps/GenomeAnalysisTK/3.3-0/java.1.7.0_67/GenomeAnalysisTK.jar -T HaplotypeCaller -R /
    home/sezimmer/variantCallingPipeline/referenceFiles/fasta/mm9.fa -I /home/sezimmer/variantCallingPipeline/varia
    ntCallingD4/dedupFolder/A1.indel.recal.dedup.reordered.bam --dbsnp /home/sezimmer/variantCallingPipeline/refere
    nceFiles/dbsnp_128.mm9.reorder.vcf --genotyping_mode DISCOVERY -mmq 30 -mbq 20 -stand_call_conf 20.0 -stand_em
    it_conf 20.0 -ERC BP_RESOLUTION --variant_index_type LINEAR --variant_index_parameter 128000 -L chr14 -nt 1 -nc
    t 1 -o /home/sezimmer/variantCallingPipeline/variantCallingD4/variants/gatk/A1.allBases.chr14.vcf

    INFO 00:37:08,214 HelpFormatter - ----------------------------------------------------------------------------

    INFO 00:37:08,216 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:0
    7:22
    INFO 00:37:08,216 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 00:37:08,216 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 00:37:08,219 HelpFormatter - Program Args: -T HaplotypeCaller -R /home/sezimmer/variantCallingPipeline/re
    ferenceFiles/fasta/mm9.fa -I /home/sezimmer/variantCallingPipeline/variantCallingD4/dedupFolder/A1.indel.recal.
    dedup.reordered.bam --dbsnp /home/sezimmer/variantCallingPipeline/referenceFiles/dbsnp_128.mm9.reorder.vcf --ge
    notyping_mode DISCOVERY -mmq 30 -mbq 20 -stand_call_conf 20.0 -stand_emit_conf 20.0 -ERC BP_RESOLUTION --varian
    t_index_type LINEAR --variant_index_parameter 128000 -L chr14 -nt 1 -nct 1 -o /home/sezimmer/variantCallingPipe
    line/variantCallingD4/variants/gatk/A1.allBases.chr14.vcf -bamout /home/sezimmer/variantCallingPipeline/variant
    CallingD4/variants/gatk/A1.allBases.chr14.bam -forceActive -disableOptimizations
    INFO 00:37:08,222 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.23.3.el6.657g0001.x86_64 amd64
    ; Java HotSpot(TM) 64-Bit Server VM 1.7.0_67-b01.
    INFO 00:37:08,222 HelpFormatter - Date/Time: 2016/01/08 00:37:08

    INFO 00:37:08,222 HelpFormatter - ----------------------------------------------------------------------------

    INFO 00:37:08,222 HelpFormatter - ----------------------------------------------------------------------------

    INFO 00:37:09,040 GenomeAnalysisEngine - Strictness is SILENT
    INFO 00:37:09,098 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
    INFO 00:37:09,105 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 00:37:09,145 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04
    INFO 00:37:09,187 HCMappingQualityFilter - Filtering out reads with MAPQ < 30
    INFO 00:37:09,643 IntervalUtils - Processing 125194864 bp from intervals
    INFO 00:37:09,714 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 00:37:10,170 GenomeAnalysisEngine - Done preparing for traversal
    INFO 00:37:10,170 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 00:37:10,171 ProgressMeter - | processed | time | per 1M | |
    total | remaining
    INFO 00:37:10,171 ProgressMeter - Location | active regions | elapsed | active regions | completed | ru
    ntime | runtime
    INFO 00:37:10,179 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model co
    nfidence output
    INFO 00:37:10,179 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence
    output
    INFO 00:37:10,388 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    INFO 00:37:40,194 ProgressMeter - chr14:2898651 0.0 30.0 s 49.6 w 2.3% 2
    1.6 m 21.1 m
    Using AVX accelerated implementation of PairHMM
    INFO 00:37:41,903 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
    INFO 00:37:41,904 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
    INFO 00:38:10,201 ProgressMeter - chr14:5070113 0.0 60.0 s 99.3 w 4.0% 2
    4.7 m 23.7 m
    INFO 00:38:40,203 ProgressMeter - chr14:8065551 0.0 90.0 s 148.9 w 6.4% 2
    3.3 m 21.8 m
    INFO 00:39:20,204 ProgressMeter - chr14:9116139 0.0 2.2 m 215.0 w 7.3% 2
    9.8 m 27.6 m
    INFO 00:39:50,204 ProgressMeter - chr14:10579551 0.0 2.7 m 264.6 w 8.5% 3
    1.6 m 28.9 m
    INFO 00:40:20,205 ProgressMeter - chr14:12387143 0.0 3.2 m 314.2 w 9.9% 3
    2.0 m 28.8 m
    INFO 00:41:00,206 ProgressMeter - chr14:13674951 0.0 3.8 m 380.3 w 10.9% 3
    5.1 m 31.3 m
    INFO 00:41:40,206 ProgressMeter - chr14:15375351 0.0 4.5 m 446.5 w 12.3% 3
    6.6 m 32.1 m
    INFO 00:42:10,207 ProgressMeter - chr14:16138388 0.0 5.0 m 496.1 w 12.9% 3
    8.8 m 33.8 m
    INFO 00:42:40,214 ProgressMeter - chr14:17228815 0.0 5.5 m 545.7 w 13.8% 4
    0.0 m 34.5 m
    INFO 00:43:10,214 ProgressMeter - chr14:17256700 0.0 6.0 m 595.3 w 13.8% 4
    3.5 m 37.5 m
    INFO 00:43:40,215 ProgressMeter - chr14:18827464 0.0 6.5 m 644.9 w 15.0% 4
    3.2 m 36.7 m
    INFO 00:44:20,215 ProgressMeter - chr14:20008400 0.0 7.2 m 711.1 w 16.0% 4
    4.8 m 37.7 m
    INFO 00:44:50,216 ProgressMeter - chr14:21143503 0.0 7.7 m 760.7 w 16.9% 4
    5.4 m 37.7 m
    INFO 00:45:30,216 ProgressMeter - chr14:21544009 0.0 8.3 m 826.8 w 17.2% 4
    8.4 m 40.1 m
    INFO 00:46:00,217 ProgressMeter - chr14:22187905 0.0 8.8 m 876.4 w 17.7% 4
    9.8 m 41.0 m
    INFO 00:46:40,218 ProgressMeter - chr14:23896316 0.0 9.5 m 942.5 w 19.1% 4
    9.8 m 40.3 m
    INFO 00:47:10,218 ProgressMeter - chr14:25228901 0.0 10.0 m 992.1 w 20.2% 4
    9.6 m 39.6 m
    INFO 00:47:40,219 ProgressMeter - chr14:26482654 0.0 10.5 m 1041.7 w 21.2% 4
    9.6 m 39.1 m
    INFO 00:48:20,219 ProgressMeter - chr14:27516112 0.0 11.2 m 1107.9 w 22.0% 5
    0.8 m 39.6 m
    INFO 00:48:50,302 ProgressMeter - chr14:27776523 0.0 11.7 m 1157.6 w 22.2% 5
    2.6 m 40.9 m
    INFO 00:49:20,303 ProgressMeter - chr14:28289504 0.0 12.2 m 1207.2 w 22.6% 5
    3.8 m 41.7 m
    INFO 00:50:00,304 ProgressMeter - chr14:29871704 0.0 12.8 m 1273.4 w 23.9% 5
    3.8 m 41.0 m
    INFO 00:50:30,304 ProgressMeter - chr14:30913644 0.0 13.3 m 1323.0 w 24.7% 5
    4.0 m 40.7 m
    INFO 00:51:00,305 ProgressMeter - chr14:31628200 0.0 13.8 m 1372.6 w 25.3% 5
    4.8 m 40.9 m
    INFO 00:51:40,305 ProgressMeter - chr14:31919818 0.0 14.5 m 1438.7 w 25.5% 5
    6.9 m 42.4 m
    INFO 00:52:10,306 ProgressMeter - chr14:32104251 0.0 15.0 m 1488.3 w 25.6% 5
    8.5 m 43.5 m
    INFO 00:52:50,307 ProgressMeter - chr14:32592410 0.0 15.7 m 1554.5 w 26.0% 6
    0.2 m 44.5 m
    INFO 00:53:20,307 ProgressMeter - chr14:33151011 0.0 16.2 m 1604.1 w 26.5% 6
    1.1 m 44.9 m
    INFO 00:53:50,308 ProgressMeter - chr14:33854200 0.0 16.7 m 1653.7 w 27.0% 6
    1.6 m 45.0 m
    INFO 00:54:30,308 ProgressMeter - chr14:34879422 0.0 17.3 m 1719.8 w 27.9% 6
    2.2 m 44.9 m
    INFO 00:55:00,309 ProgressMeter - chr14:35413396 0.0 17.8 m 1769.4 w 28.3% 6
    3.0 m 45.2 m
    INFO 00:55:40,310 ProgressMeter - chr14:36939351 0.0 18.5 m 1835.5 w 29.5% 6
    2.7 m 44.2 m

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Is that the complete output? I agree it looks like the run didn't finish. Can you tell if it just froze?

  • sezimmersezimmer Bronx, New YorkMember

    That is all the output. I ran the job on a SGE cluster last night, and this morning, the job was no longer running. Does that help?

  • sezimmersezimmer Bronx, New YorkMember

    You are probably right. My poor job, it never had a chance.

Sign In or Register to comment.