To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Using GATK: create a F0 SNP library and then genotype F2 sample using it

Hello GATK community,

I would like your comments/suggestions for my strategy.

I have F0 samples with two different phenotype.
I have F2 samples with unknown phenotype.
I would like to create a library with the F0 genotypes and then genotype my F2 samples using the previously created library.

I already pre-processed BAM files (I have all raw data if required).

Create genotype library with F0 samples:

  • GATK HaplotypeCaller for both F0 phenotype samples : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf

  • Merge the results: java -Xmx16g -jar GenomeAnalysisTK_3-8.jar -nt 16 -T GenotypeGVCFs -R GENOME --variant F0Variant1.g.vcf --variant F0Variant2.g.vcf -o Results_Merge_F0.vcf

  • then i used a homemade script to select only position with homozygous genotype and different genotype between both F0 phenotype samples (like 1/1 for a F0 sample and 0/0 for the other one): Results_Merge_F0_filtered.vcf

Genotype F2 sample with the library:

  • GATK HaplotypeCaller : java -Xmx30g -jar GenomeAnalysisTK_3-8.jar -nct 16 -T HaplotypeCaller -R GENOME --emitRefConfidence GVCF -I INPUT.bam -o OUTPUT.g.vcf -L $4 Results_Merge_F0_filtered.vcf

  • then i used a homemade script to identify genotype related to one (or the other) F0 phenotype.

At this last step i mostly got homozygous SNP for my F2 samples...
I should get around 25% phenotype1 -- 25% phenotype2 -- 50% phenotype 1/2
I miss something but I don't know where.


  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    I think the issue is that you are using only the homozygous sites from F0 as the intervals file when running HaplotypeCaller on the F2 samples. It looks like you only selected the homozygous sites from F0 samples, right?


  • nkaspricnkaspric franceMember


    The main objectif is to create an input file for QTL mapping search.

    My library (-L Results_Merge_F0_filtered.vcf) is only composed of homozygous SNP from F0 samples. For these F0 samples, I only want the position where they have different genotype and only homozygous.
    I were expected to get heterozygous genotype for the F2 samples. Maybe I don't use the good settings/practice.


  • nkaspricnkaspric franceMember

    **Here are some details **

    • I use this type of vcf file for -L option:


    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 251927 . T TATTAGTGTATGCC,TATTAGTGTATGCCTGG 4610.16 . AC=2,2;AF=0.500,0.500;AN=4;DP=106;ExcessHet=3.0103;FS=0.000;MLEAC=2,2;MLEAF=0.500,0.500;MQ=32.63;QD=24.47;SOR=0.921 GT:AD:DP:GQ:PL 2/2:0,0,53:53:99:2359,2360,2360,159,160,0 1/1:0,52,0:52:99:2287,156,0,2288,157,2289
    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 816448 . TGACA T 2071.38 . AC=2;AF=0.500;AN=4;DP=113;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=23.17;QD=32.52;SOR=3.033 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,47:47:99:1|1:816445_C_CAATA:2115,141,0 0/0:66,0:66:99:.:.:0,120,1800

    • I hopped to get this type of results for my F2 samples ;

    2R_dna:chromosome_chromosome:AgamP4:2R:1:61545105:1 816448 . TGACA T 2071.38 . AC=2;AF=0.500;AN=4;DP=113;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=23.17;QD=32.52;SOR=3.033 GT:AD:DP:GQ:PGT:PID:PL 1/2:15,21:36:........

    BUT I mostly had 1/1 or 2/2 or 0/0....and really a little of heterozygous (5%).

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi Nico,

    Okay. That is odd. Can you post some example VCF records for sites that are homozygous in the F0 and homozygous in the F2? Can you also post some VCF records that are homozygous in F0 and heterozygous in F2?


  • nkaspricnkaspric franceMember
    edited March 15

    Hi Sheila,

    VCF record for the F0 library (two F0 genotype) :
    2R_dna 2528480 . T G 6623.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.77;ClippingRankSum=0.00;DP=394;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=30.89;MQRankSum=-1.461e+00;QD=31.10;ReadPosRankSum=-3.830e-01;SOR=0.531 GT:AD:DP:GQ:PL 0/0:181,0:181:99:0,120,1800 1/1:1,212:213:99:6658,628,0

    2R_dna 2816450 . GCC G,GC 6168.16 . AC=2,2;AF=0.500,0.500;AN=4;BaseQRankSum=0.690;ClippingRankSum=0.00;DP=198;ExcessHet=3.0103;FS=0.000;MLEAC=2,2;MLEAF=0.500,0.500;MQ=30.20;MQRankSum=0.827;QD=32.81;ReadPosRankSum=1.52;SOR=1.150 GT:AD:DP:GQ:PL 2/2:0,3,130:133:99:4089,4001,4064,396,281,0 1/1:1,54,0:55:99:2115,155,0,2117,162,2124

    3L_dna 38240596 . C A 6001.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=1.67;ClippingRankSum=0.00;DP=325;ExcessHet=0.7918;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=34.89;MQRankSum=-1.946e+00;QD=29.60;ReadPosRankSum=-7.350e-01;SOR=0.663 GT:AD:DP:GQ:PGT:PID:PL 1/1:7,142:149:99:1|1:38240596_C_A:6036,137,0 0/0:176,0:176:99:.:.:0,120,1800

    3L_dna 38245905 . G A 3769.42 . AC=2;AF=0.500;AN=4;BaseQRankSum=0.594;ClippingRankSum=0.00;DP=359;ExcessHet=0.7918;FS=16.777;MLEAC=2;MLEAF=0.500;MQ=24.04;MQRankSum=-2.349e+00;QD=27.92;ReadPosRankSum=2.11;SOR=3.621 GT:AD:DP:GQ:PGT:PID:PL 0/0:221,0:221:99:.:.:0,120,1800 1/1:2,133:135:99:1|1:38245905_G_A:3804,391,0

    VCF record for HET F2 random sample :
    2R_dna 2517917 . A T, 319.77 . BaseQRankSum=-2.236;ClippingRankSum=0.000;DP=19;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-3.555;RAW_MQ=22793.00;ReadPosRankSum=1.800 GT:AD:DP:GQ:PL:SB 0/1:10,9,0:19:99:348,0,1281,378,1308,1686:2,8,4,5

    2R_dna 2528480 . T G, 291.77 . BaseQRankSum=-2.400;ClippingRankSum=0.000;DP=24;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-3.774;RAW_MQ=31871.00;ReadPosRankSum=0.134 GT:AD:DP:GQ:PL:SB 0/1:12,12,0:24:99:320,0,475,356,511,867:5,7,4,8

    3L_dna 39402586 . A T, 718.79 . BaseQRankSum=0.477;ClippingRankSum=0.000;DP=20;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.169;RAW_MQ=21438.00;ReadPosRankSum=1.425 GT:AD:DP:GQ:PL:SB 0/1:1,19,0:20:23:747,0,23,750,81,831:1,0,13,6

    VCF record for HOM F2 random sample :
    2R_dna 2816450 . GC G, 305.73 . DP=11;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=10291.00 GT:AD:DP:GQ:PL:SB 1/1:0,11,0:11:33:343,33,0,343,33,343:0,0,6,5

    3L_dna 38240596 . C A, 542.77 . DP=13;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQ=12959.00 GT:AD:DP:GQ:PL:SB 1/1:0,13,0:13:39:571,39,0,571,39,571:0,0,11,2

    3L_dna 38245905 . G . . END=38245905 GT:DP:GQ:MIN_DP:PL 0/0:22:66:22:0,66,887

Sign In or Register to comment.