If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Haplotype caller BP_RESOLUTION :More AD values than alleles called for

My intention is to find different bases called in a particular chromosome location irrespective of it being assigned as SNP/badbase. I user the below command:
java -jar 3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R Reference.fa -I Sample1.bam -o Sample1.BPR.vcf -ERC BP_RESOLUTION -L 1

I got my intended result but is confused with the result for example:

1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800
1 8333303 . A AG,AT,G,T, 0 . BaseQRankSum=0.913;ClippingRankSum=-0.141;DP=57;ExcessHet=3.0103;MLEAC=0,0,0,0,0;MLEAF=0.00,0.00,0.00,0.00,0.00;MQRankSum=0.445;RAW_MQ=205200.00;ReadPosRankSum=-0.85GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

Why there are more values for AD than the number of Alleles called for. Please note I am working with RNAseq dataset after BQSR

Best Answers


  • bassubassu UAEMember


    Thank you for the response. My concern is the following

    1)if the AD value 92 is for "A", What is the other base which has1 count? Using IGV I found it to be "G". Why is that not been show in ALT tab?

    1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800

    2) Why there are 2 base ALT for a single Chr position? For example AG/AT . Which again was not seen in IGV.

    1 8333303 . A AG,AT,G,T, 0 . GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

  • bassubassu UAEMember
    edited November 2017


    I had run GenotypeGVCFs on the combined gvcf ( got more than >100 samples) using the below command

    gatk -T GenotypeGVCFs -R  REFERENCE \
    -L MT \
    --variant ALL_SAMPLEScleaned.aligned.chrom.MT.g.vcf.gz \
    -stand_call_conf 30 \
    -o ALL_SAMPLES_GT_MT.g.vcf.gz

    The vcf file looks like following(Truncated out most samples for viewing)
    MT 16184 . C T,* 245600.79 . AC=4,6;AF=0.014,0.021;AN=292;BaseQRankSum=2.82;ClippingRankSum=0.931;DP=267366;ExcessHet=0.0000;FS=2.107;InbreedingCoeff=0.5408;MLEAC=4,6;MLEAF=0.014,0.021;MQ=23.07;MQRankSum=0.378;QD=27.27;ReadPosRankSum=2.34;SOR=0.459 GT:AD:DP:GQ:PGT:PID:PL 0/0:1561,0,0:1561:99:.:.:0,120,1800,120,1800,1800 ./.:1505,0,0:1505 0/0:2021,0,0:2021:0:.:.:0,0,18042,0,18042,1804 0/0:2067,0,0:2067:99:.:.:0,120,1800,120,1800,1800 1/1:99,1689,0:1789:99:.:.:45065,3225,0,45298,5049,4711 0/0:2067,0,0:2067:0:.:.:0,0,46379,0,46379,46379 0/0:2554,0,0:2554:99:.:.:0,120,1800,120,1800,1800 0/0:2523,0,0:2523:99:.:.:0,120,1800,120,1800,1800 0/0:2070,0,0:2070:99:.:.:0,120,1800,120,1800,1800 0/0:1679,0,0:1679:0:.:.:0,0,34004,0,34004,34004 0/0:1684,0,0:1684:0:.:.:0,0,33011,0,33011,33011 0/0:2029,0,0:2029:99:.:.:0,120,1800,120,1800,1800 2/2:45,0,1845:1890:99:.:.:49079,49194,49631,5023,5460,0

    My Question is as follows
    1) What is the ALT allel "*" ?
    2) the genotype is called "./." (Sample 2) and I undestand it stands for missing data. So how come the AD is 1505?
    3) the Genotype called 2/2(last sample) So what will be the bases?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    1,3) Have a look at this dictionary entry.

    2) It is possible all the reads are uninformative or have low mapping/base qualities. You can take a look at the GVCF record for that sample. It should show the GQ=0 which means the tool is not confident in any genotype.


Sign In or Register to comment.