Haplotype caller BP_RESOLUTION :More AD values than alleles called for

My intention is to find different bases called in a particular chromosome location irrespective of it being assigned as SNP/badbase. I user the below command:
java -jar 3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R Reference.fa -I Sample1.bam -o Sample1.BPR.vcf -ERC BP_RESOLUTION -L 1

I got my intended result but is confused with the result for example:

1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800
1 8333303 . A AG,AT,G,T, 0 . BaseQRankSum=0.913;ClippingRankSum=-0.141;DP=57;ExcessHet=3.0103;MLEAC=0,0,0,0,0;MLEAF=0.00,0.00,0.00,0.00,0.00;MQRankSum=0.445;RAW_MQ=205200.00;ReadPosRankSum=-0.85GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

Why there are more values for AD than the number of Alleles called for. Please note I am working with RNAseq dataset after BQSR

Best Answers

Answers

  • bassubassu UAEMember

    Sheila,

    Thank you for the response. My concern is the following

    1)if the AD value 92 is for "A", What is the other base which has1 count? Using IGV I found it to be "G". Why is that not been show in ALT tab?

    1 1222274 . A . . . GT:AD:DP:GQ:PL 0/0:92,1:93:99:0,120,1800

    2) Why there are 2 base ALT for a single Chr position? For example AG/AT . Which again was not seen in IGV.

    1 8333303 . A AG,AT,G,T, 0 . GT:AD:DP:GQ:PL:SB 0/0:39,2,5,2,2,0:50:36:0,79,1151,36,1041,1086,88,985,919,1611,88,985,919,1546,1611,110,942,900,957,957,928:10,29,3,8

  • bassubassu UAEMember
    edited November 2017

    hi,

    I had run GenotypeGVCFs on the combined gvcf ( got more than >100 samples) using the below command

    gatk -T GenotypeGVCFs -R  REFERENCE \
    -L MT \
    --variant ALL_SAMPLEScleaned.aligned.chrom.MT.g.vcf.gz \
    -stand_call_conf 30 \
    -o ALL_SAMPLES_GT_MT.g.vcf.gz
    

    The vcf file looks like following(Truncated out most samples for viewing)
    MT 16184 . C T,* 245600.79 . AC=4,6;AF=0.014,0.021;AN=292;BaseQRankSum=2.82;ClippingRankSum=0.931;DP=267366;ExcessHet=0.0000;FS=2.107;InbreedingCoeff=0.5408;MLEAC=4,6;MLEAF=0.014,0.021;MQ=23.07;MQRankSum=0.378;QD=27.27;ReadPosRankSum=2.34;SOR=0.459 GT:AD:DP:GQ:PGT:PID:PL 0/0:1561,0,0:1561:99:.:.:0,120,1800,120,1800,1800 ./.:1505,0,0:1505 0/0:2021,0,0:2021:0:.:.:0,0,18042,0,18042,1804 0/0:2067,0,0:2067:99:.:.:0,120,1800,120,1800,1800 1/1:99,1689,0:1789:99:.:.:45065,3225,0,45298,5049,4711 0/0:2067,0,0:2067:0:.:.:0,0,46379,0,46379,46379 0/0:2554,0,0:2554:99:.:.:0,120,1800,120,1800,1800 0/0:2523,0,0:2523:99:.:.:0,120,1800,120,1800,1800 0/0:2070,0,0:2070:99:.:.:0,120,1800,120,1800,1800 0/0:1679,0,0:1679:0:.:.:0,0,34004,0,34004,34004 0/0:1684,0,0:1684:0:.:.:0,0,33011,0,33011,33011 0/0:2029,0,0:2029:99:.:.:0,120,1800,120,1800,1800 2/2:45,0,1845:1890:99:.:.:49079,49194,49631,5023,5460,0

    My Question is as follows
    1) What is the ALT allel "*" ?
    2) the genotype is called "./." (Sample 2) and I undestand it stands for missing data. So how come the AD is 1505?
    3) the Genotype called 2/2(last sample) So what will be the bases?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @bassu
    Hi,

    1,3) Have a look at this dictionary entry.

    2) It is possible all the reads are uninformative or have low mapping/base qualities. You can take a look at the GVCF record for that sample. It should show the GQ=0 which means the tool is not confident in any genotype.

    -Sheila

Sign In or Register to comment.