If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using GATK: create a F0 SNP library and then genotype F2 sample using it

nkaspricnkaspric franceMember

Hello GATK community,

I would like your comments/suggestions for my strategy.

I have F0 samples with two different phenotype.
I have F2 samples with unknown phenotype.
I would like to create a library with the F0 genotypes and then genotype my F2 samples using the previously created library.

I already pre-processed BAM files (I have all raw data if required).

Create genotype library with F0 samples:

  • GATK HaplotypeCaller for both F0 phenotype samples : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf

  • Merge the results: java -Xmx16g -jar GenomeAnalysisTK_3-8.jar -nt 16 -T GenotypeGVCFs -R GENOME --variant F0Variant1.g.vcf --variant F0Variant2.g.vcf -o Results_Merge_F0.vcf

  • then i used a homemade script to select only position with homozygous genotype and different genotype between both F0 phenotype samples (like 1/1 for a F0 sample and 0/0 for the other one): Results_Merge_F0_filtered.vcf

Genotype F2 sample with the library:

  • GATK HaplotypeCaller : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf -L $4 Results_Merge_F0_filtered.vcf

  • then i used a homemade script to identify genotype related to one (or the other) F0 phenotype.

At this last step i mostly got homozygous SNP for my F2 samples...
I should get around 25% phenotype1 -- 25% phenotype2 -- 50% phenotype 1/2
I miss something but I don't know where.


  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin


    I think the issue is that you are using only the homozygous sites from F0 as the intervals file when running HaplotypeCaller on the F2 samples. It looks like you only selected the homozygous sites from F0 samples, right?


  • nkaspricnkaspric franceMember


    The main objectif is to create an input file for QTL mapping search.

    My library (-L Results_Merge_F0_filtered.vcf) is only composed of homozygous SNP from F0 samples. For these F0 samples, I only want the position where they have different genotype and only homozygous.
    I were expected to get heterozygous genotype for the F2 samples. Maybe I don't use the good settings/practice.


  • nkaspricnkaspric franceMember

    **Here are some details **

    • I use this type of vcf file for -L option:


    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 251927 . T TATTAGTGTATGCC,TATTAGTGTATGCCTGG 4610.16 . AC=2,2;AF=0.500,0.500;AN=4;DP=106;ExcessHet=3.0103;FS=0.000;MLEAC=2,2;MLEAF=0.500,0.500;MQ=32.63;QD=24.47;SOR=0.921 GT:AD:DP:GQ:PL 2/2:0,0,53:53:99:2359,2360,2360,159,160,0 1/1:0,52,0:52:99:2287,156,0,2288,157,2289
    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 816448 . TGACA T 2071.38 . AC=2;AF=0.500;AN=4;DP=113;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=23.17;QD=32.52;SOR=3.033 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,47:47:99:1|1:816445_C_CAATA:2115,141,0 0/0:66,0:66:99:.:.:0,120,1800

    • I hopped to get this type of results for my F2 samples ;

    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 816448 . TGACA T 2071.38 . AC=2;AF=0.500;AN=4;DP=113;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=23.17;QD=32.52;SOR=3.033 GT:AD:DP:GQ:PGT:PID:PL 1/2:15,21:36:........

    BUT I mostly had 1/1 or 2/2 or 0/0....and really a little of heterozygous (5%).

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    Hi Nico,

    Okay. That is odd. Can you post some example VCF records for sites that are homozygous in the F0 and homozygous in the F2? Can you also post some VCF records that are homozygous in F0 and heterozygous in F2?


  • nkaspricnkaspric franceMember
    edited March 2018

    Hi Sheila,

    VCF record for the F0 library (two F0 genotype) :
    2R_dna 2528480 . T G 6623.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.77;ClippingRankSum=0.00;DP=394;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=30.89;MQRankSum=-1.461e+00;QD=31.10;ReadPosRankSum=-3.830e-01;SOR=0.531 GT:AD:DP:GQ:PL 0/0:181,0:181:99:0,120,1800 1/1:1,212:213:99:6658,628,0

    2R_dna 2816450 . GCC G,GC 6168.16 . AC=2,2;AF=0.500,0.500;AN=4;BaseQRankSum=0.690;ClippingRankSum=0.00;DP=198;ExcessHet=3.0103;FS=0.000;MLEAC=2,2;MLEAF=0.500,0.500;MQ=30.20;MQRankSum=0.827;QD=32.81;ReadPosRankSum=1.52;SOR=1.150 GT:AD:DP:GQ:PL 2/2:0,3,130:133:99:4089,4001,4064,396,281,0 1/1:1,54,0:55:99:2115,155,0,2117,162,2124

    3L_dna 38240596 . C A 6001.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.67;ClippingRankSum=0.00;DP=325;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=34.89;MQRankSum=-1.946e+00;QD=29.60;ReadPosRankSum=-7.350e-01;SOR=0.663 GT:AD:DP:GQ:PGT:PID:PL 1/1:7,142:149:99:1|1:38240596_C_A:6036,137,0 0/0:176,0:176:99:.:.:0,120,1800

    3L_dna 38245905 . G A 3769.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=0.594;ClippingRankSum=0.00;DP=359;ExcessHet=0.7918;FS=16.777;MLEAC=2;MLEAF=0.500;MQ=24.04;MQRankSum=-2.349e+00;QD=27.92;ReadPosRankSum=2.11;SOR=3.621 GT:AD:DP:GQ:PGT:PID:PL 0/0:221,0:221:99:.:.:0,120,1800 1/1:2,133:135:99:1|1:38245905_G_A:3804,391,0

    VCF record for HET F2 random sample :
    2R_dna 2517917 . A T, 319.77 . BaseQRankSum=-2.236;ClippingRankSum=0.000;DP=19;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-3.555;RAW_MQ=22793.00;ReadPosRankSum=1.800 GT:AD:DP:GQ:PL:SB 0/1:10,9,0:19:99:348,0,1281,378,1308,1686:2,8,4,5

    2R_dna 2528480 . T G, 291.77 . BaseQRankSum=-2.400;ClippingRankSum=0.000;DP=24;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-3.774;RAW_MQ=31871.00;ReadPosRankSum=0.134 GT:AD:DP:GQ:PL:SB 0/1:12,12,0:24:99:320,0,475,356,511,867:5,7,4,8

    3L_dna 39402586 . A T, 718.79 . BaseQRankSum=0.477;ClippingRankSum=0.000;DP=20;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.169;RAW_MQ=21438.00;ReadPosRankSum=1.425 GT:AD:DP:GQ:PL:SB 0/1:1,19,0:20:23:747,0,23,750,81,831:1,0,13,6

    VCF record for HOM F2 random sample :
    2R_dna 2816450 . GC G, 305.73 . DP=11;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=10291.00 GT:AD:DP:GQ:PL:SB 1/1:0,11,0:11:33:343,33,0,343,33,343:0,0,6,5

    3L_dna 38240596 . C A, 542.77 . DP=13;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=12959.00 GT:AD:DP:GQ:PL:SB 1/1:0,13,0:13:39:571,39,0,571,39,571:0,0,11,2

    3L_dna 38245905 . G . . END=38245905 GT:DP:GQ:MIN_DP:PL 0/0:22:66:22:0,66,887

    Issue · Github
    by shlee

    Issue Number
    Last Updated
    Closed By
  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @nkaspric,

    Thanks for posting the records. Sheila is away for a workshop and can follow up with you next week.

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    Hi Nico,

    I see. I kind of see what may be happening. Some of the sites which seem to be Mendelian violations have low GQ and low coverage. But, first, it looks like you only have the GVCFs and not the final VCF. Can you run GenotypeGVCFs on your GVCFs? Have a look at this article for more information.

    Also, this may be a good case where running the Genotype Refinement workflow will help.


  • nkaspricnkaspric franceMember

    Hi Sheila,

    Thank you for these advices, I will give you a feed back soon :)

Sign In or Register to comment.