Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

low DP value after using GATK 4.0.12 Haplotypecaller

yb87625yb87625 Member

Dear GATK team,

I am using GATK 4.0.12 Haplotypecaller for calling the variants from the data of Targeted Amplicon sequencing.
however, the DP value of the .vcf is much lower than the value in BAM file viewing with IGV.

May i ask you how to figure out this issue?

thanks.

Answers

  • bshifawbshifaw admin Member, Broadie, Moderator admin

    Hi @yb87625 ,

    Mind posting the command you used along with IGV screenshots of the bamout of the affected area. Use this article as reference.

    Here are some related posts, check if you see any commonalities :
    forum 3356
    forum 13680
    git 5434

  • yb87625yb87625 Member
    edited May 14

    Thanks @bshifaw
    I used the command for variants calling as following:

    java -jar -Xmx24G gatk -package-4.0.12.0-local.jar  HaplotypeCaller \
        -R ${reference} \
        -I ${outputdir}/BQSR/${SM}.sorted.recal.bam \
        --dbsnp ${gatk_bundle}/dbsnp_138.hg19.vcf \
        -L ${bedfile} \
        -stand-call-conf 30 \
        -mbq 20 \
        --output ${outputdir}/vcf/${SM}_raw.snps.indels.vcf \
    

    and one example result of otherinfo in BAM file after called as following:

    0.5 620.77  94  chr17   41228617    rs80356932  G   A   620.77  PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=5.144;DB;DP=95;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=6.60;ReadPosRankSum=0.049;SOR=0.172    GT:AD:DP:GQ:PL  0/1:66,28:94:99:649,0,1808
    

    while using IGV for checking, it shows

    please tell me what's the problem it is
    thanks a lot.

    Post edited by bshifaw on
  • bshifawbshifaw admin Member, Broadie, Moderator admin

    @yb87625

    I spoke with the dev team, here is what they suggested.

    This may be due to duplicate marking (Since this is targeted amplicon sequencing there can be a large number of reads that would be marked as duplicates). Were your reads marked for duplicates with MarkDuplicates on this data before running HaplotypeCaller? If not, please do so. If they were, set IGV to filter duplicates (View->Preferences->Alignments->"Filter duplicate reads"), and see if the depth observed in IGV then agrees better with what you see in the VCF.

  • yb87625yb87625 Member
    edited May 15

    @bshifaw
    1. The data i posted didn't ran the "MarkDuplicates", and If ran, the depth will be much lower in the VCF, the maximum depth of each reads reach 99.
    2. actually, the pic. of IGV I showed have already filter the duplicate reads.

  • bshifawbshifaw admin Member, Broadie, Moderator admin

    Hi @yb87625 ,
    Miscommunication on my part, MarkDuplicates isn't necessary. Your process sounds fine.

    Dev:

    Other things to look for:
    1. It's possible many of the reads are being filtered because of low mapping quality or low base quality at the variant site. The user could look for this.
    2. They could also run HC just over the active region shown, and the printout to the screen will tell how many reads were filtered out and for what reasons.

  • yb87625yb87625 Member

    @bshifaw
    would you please tell me more detail about how to look for the two points?
    thanks a lot.

  • bshifawbshifaw admin Member, Broadie, Moderator admin
    1. Check the post terminal output, the tool will display the number of reads being filtered as a log summary.
    2. Use -L to specify intervals on which you want to run the tool, then check the amount of filtration thats occuring at that interval from the log summary.
  • yb87625yb87625 Member

    @bshifaw
    I did the Haplotypecaller for the same input bamfile with GATK3.8.1 and GATK4.1, and the output vcf files have many differences, including the amount of variants called and the info of them.
    I wonder whether some wrong setting with script in GATK4.
    Would you please help me to have a look at the commands?
    Thank you very much.

    GATK3.8:
    $GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R ${reference} \
    -I ${inputbam}/s1991.sorted.bam \
    --dbsnp ${gatk_bundle}/dbsnp_138.hg19.vcf \
    -L ${bedfile} \
    -stand_call_conf 30 -mbq 20 \
    --out ${outputdir}/vcf/s1991_raw.snps.indels.vcf \
    -dt NONE

    result of GATK3.8 :

    GATK4:
    $gatk HaplotypeCaller \
    -R ${reference} \
    -I ${inputbam}/s1991.sorted.bam \
    --dbsnp ${gatk_bundle}/dbsnp_138.hg19.vcf \
    -L ${bedfile} \
    -stand-call-conf 30 -mbq 20 \
    --output ${outputdir}/s1991_BQSR_GATK4.raw.snps.indels.vcf

    results of GATK4:

  • bshifawbshifaw admin Member, Broadie, Moderator admin

    Your commands look fine, I'll check with the dev team if this is abnormal behavior.

  • yb87625yb87625 Member

    @bshifaw
    sorry, the results above should be changed.
    the first one is from GATK4 and the second one is from GATK3.8.
    thanks.

  • bshifawbshifaw admin Member, Broadie, Moderator admin

    Hi @yb87625 ,

    The dev team will need some additional info, please share an IGV screenshot with the bamout file. You can produce the bamout file by running your command with -bamout argument. This will display haplotypecallers locally aligned reads used to call the variants. You can use these instructions to create the bamout here.

  • yb87625yb87625 Member

    @bshifaw
    May I ask you which region I should called in the bamout file. the whole region or the any specific position?

  • yb87625yb87625 Member

    @bshifaw
    please find the IGV screenshot about the bamout file of GATK3.8 and GATK4.1.2 as following.
    Screen Shot 2019-05-29 at 11.22.54 AM

    the first one are from GATK3, then GATK4, and the input sorted bamfile are the last.
    And I didnot do the BQSR before haplotypecalling this time.
    and the commands are as following.
    GATK3: java -jar -Xmx8G /usr/GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R ${reference} \
    -I ${inputbam}/${SM}.sorted.bam\
    --dbsnp ${gatk_bundle}/dbsnp_138.hg19.vcf \
    -L ${bedfile} \
    -stand_call_conf 30 -mbq 20 \
    -o ${outputdir}/${SM}_raw.snps.indels.vcf \
    -bamout ${outputdir}/gatk3.8_bamout.bam \
    -dt NONE "

    GATK4: gatk HaplotypeCaller \
    --reference ${reference} \
    --input ${inputbam}/${SM}.sorted.bam\
    --dbsnp ${gatk_bundle}/dbsnp_138.hg19.vcf \
    -L ${bedfile} \
    -stand-call-conf 30 -mbq 20 \
    --output ${outputdir}/${SM}_raw.snps.indels.vcf \
    -bamout ${outputdir}/gatk4_bamout.bam "

    thanks a lot.

  • bshifawbshifaw admin Member, Broadie, Moderator admin
    edited June 4

    Hi @yb87625

    I talked to the dev team and they mentioned

    For your gatk3.8 command, looks like you've added the CommandLineGATK parameter -dt NONE which would result in a high DP.

    --downsampling_type / -dt
    Type of read downsampling to employ at a given locus
    

    GATK 4 automatically downsamples input reads and there aren't CommandLineGATK parameters in GATK4 but you may be able to get similar results by setting --max-reads-per-alignment-start to 0 in HaplotypeCaller.

    --max-reads-per-alignment-start
    Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
    

    here

    Edit: Corrected what -dt does

    Post edited by bshifaw on
  • yb87625yb87625 Member
    edited June 4

    @bshifaw
    it seems some misunderstanding.
    The previous result shows that GATK3.8 has higher DP but GATK4 has lower.

  • bshifawbshifaw admin Member, Broadie, Moderator admin
    edited June 4

    My mistake, I meant to say
    -dt NONEin gatk3 would provide a high DP, since its preventing the tool from downsampling. gatk4 doesn't have this parameter but you can attempt to prevent haplotypecaller from downsampling by using --max-reads-per-alignment-start

  • yb87625yb87625 Member

    @bshifaw
    Thank you very much.

Sign In or Register to comment.