To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

How to get heterozygotes SNP with HaplotypeCaller ?

Hi,

I am new to GATK, I try to find SNPs for paired-end data in the mosquito. The genome of the mosquito many polymorphism.
I try to get a VCF file for each position all posibility for a SNP. In fact, when I look at my VCF file I have only one posibility for SNP as often it is heterozygous.

Example :
R 86 . T A 73.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.000;ClippingRankSum=0.000;DP=8;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=53.84;MQRankSum=-2.369;QD=9.22;ReadPosRankSum=0.992;SOR=0.368 GT:AD:DP:GQ:PL 0/1:5,3:8:99:102,0,188

At this position, with IGV i can see an heterozygous SNP : some reads are A other are T like the reference. Is it possible to get this information ?

This is my command line :
java -Xmx8g -jar GenomeAnalysisTK.jar -nct 4 -T HaplotypeCaller -R ../GENOME/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa -I ../RESULTS/NJ3-5302_2016-09-30/MAPPING_NJ3-5302.sorted.bam -o ../RESULTS/NJ3-5302_2016-09-30/test.vcf -mbq 25 -gt_mode DISCOVERY -L 2R:1-500000

Thx,
Nicolas

Answers

  • nkaspricnkaspric franceMember

    I answer to my question alone :)

    For poeple who need this information : http://gatkforums.broadinstitute.org/gatk/discussion/1268/what-is-a-vcf-and-how-should-i-interpret-it

    VCF file informs about the heterozygosity :smile:
    5. How the genotype and other sample-level information is represented

    The sample-level information contained in the VCF (also called "genotype fields") may look a bit complicated at first glance, but they're actually not that hard to interpret once you understand that they're just sets of tags and values.

    Let's take a look at three of the records shown earlier, simplified to just show the key genotype annotations:

    1 873762 . T G [CLIPPED] GT:AD:DP:GQ:PL 0/1:173,141:282:99:255,0,255
    1 877664 rs3828047 A G [CLIPPED] GT:AD:DP:GQ:PL 1/1:0,105:94:99:255,255,0
    1 899282 rs28548431 C T [CLIPPED] GT:AD:DP:GQ:PL 0/1:1,3:4:26:103,0,26
    Looking at that last column, here is what the tags mean:

    GT : The genotype of this sample at this site.
    For a diploid organism, the GT field indicates the two alleles carried by the sample, encoded by a 0 for the REF allele, 1 for the first ALT allele, 2 for the second ALT allele, etc. When there's a single ALT allele (by far the more common case), GT will be either:

    0/0 - the sample is homozygous reference
    0/1 - the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
    1/1 - the sample is homozygous alternate
    In the three sites shown in the example above, NA12878 is observed with the allele combinations T/G, G/G, and C/T respectively.
    For non-diploids, the same pattern applies; in the haploid case there will be just a single value in GT; for polyploids there will be more, e.g. 4 values for a tetraploid organism.

    Nicolas

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nkaspric
    Hi Nicolas,

    I am happy you figured it out yourself! :smile:

    -Sheila

Sign In or Register to comment.