Haplotype caller BP_RESOLUTION :More AD values than alleles called for

My intention is to find different bases called in a particular chromosome location irrespective of it being assigned as SNP/badbase. I user the below command:
java -jar 3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R Reference.fa -I Sample1.bam -o Sample1.BPR.vcf -ERC BP_RESOLUTION -L 1

I got my intended result but is confused with the result for example:

1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800
1 8333303 . A AG,AT,G,T, 0 . BaseQRankSum=0.913;ClippingRankSum=-0.141;DP=57;ExcessHet=3.0103;MLEAC=0,0,0,0,0;MLEAF=0.00,0.00,0.00,0.00,0.00;MQRankSum=0.445;RAW_MQ=205200.00;ReadPosRankSum=-0.85GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

Why there are more values for AD than the number of Alleles called for. Please note I am working with RNAseq dataset after BQSR

Best Answers


  • bassubassu UAEMember


    Thank you for the response. My concern is the following

    1)if the AD value 92 is for "A", What is the other base which has1 count? Using IGV I found it to be "G". Why is that not been show in ALT tab?

    1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800

    2) Why there are 2 base ALT for a single Chr position? For example AG/AT . Which again was not seen in IGV.

    1 8333303 . A AG,AT,G,T, 0 . GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

  • bassubassu UAEMember
    edited November 2017


    I had run GenotypeGVCFs on the combined gvcf ( got more than >100 samples) using the below command

    gatk -T GenotypeGVCFs -R  REFERENCE \
    -L MT \
    --variant ALL_SAMPLEScleaned.aligned.chrom.MT.g.vcf.gz \
    -stand_call_conf 30 \
    -o ALL_SAMPLES_GT_MT.g.vcf.gz

    The vcf file looks like following(Truncated out most samples for viewing)
    MT 16184 . C T,* 245600.79 . AC=4,6;AF=0.014,0.021;AN=292;BaseQRankSum=2.82;ClippingRankSum=0.931;DP=267366;ExcessHet=0.0000;FS=2.107;InbreedingCoeff=0.5408;MLEAC=4,6;MLEAF=0.014,0.021;MQ=23.07;MQRankSum=0.378;QD=27.27;ReadPosRankSum=2.34;SOR=0.459 GT:AD:DP:GQ:PGT:PID:PL 0/0:1561,0,0:1561:99:.:.:0,120,1800,120,1800,1800 ./.:1505,0,0:1505 0/0:2021,0,0:2021:0:.:.:0,0,18042,0,18042,1804 0/0:2067,0,0:2067:99:.:.:0,120,1800,120,1800,1800 1/1:99,1689,0:1789:99:.:.:45065,3225,0,45298,5049,4711 0/0:2067,0,0:2067:0:.:.:0,0,46379,0,46379,46379 0/0:2554,0,0:2554:99:.:.:0,120,1800,120,1800,1800 0/0:2523,0,0:2523:99:.:.:0,120,1800,120,1800,1800 0/0:2070,0,0:2070:99:.:.:0,120,1800,120,1800,1800 0/0:1679,0,0:1679:0:.:.:0,0,34004,0,34004,34004 0/0:1684,0,0:1684:0:.:.:0,0,33011,0,33011,33011 0/0:2029,0,0:2029:99:.:.:0,120,1800,120,1800,1800 2/2:45,0,1845:1890:99:.:.:49079,49194,49631,5023,5460,0

    My Question is as follows
    1) What is the ALT allel "*" ?
    2) the genotype is called "./." (Sample 2) and I undestand it stands for missing data. So how come the AD is 1505?
    3) the Genotype called 2/2(last sample) So what will be the bases?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    1,3) Have a look at this dictionary entry.

    2) It is possible all the reads are uninformative or have low mapping/base qualities. You can take a look at the GVCF record for that sample. It should show the GQ=0 which means the tool is not confident in any genotype.


Sign In or Register to comment.