Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Difference in PL, DP values while running GATK 3.7 HaplotypeCaller on the same sample in two runs

Krithika_SubramanianKrithika_Subramanian BangaloreMember
edited April 8 in Ask the GATK team

We ran GATK 3.7 HaplotypeCaller upon a sample to get .gVCF file few months back. Recently we tested out the same sample with same parameters of GATK 3.7 HaplotypeCaller and found that there is difference in the DP,PL values for many variants when comparing the two output .GVCF files from these two runs.

The command line parameters used for both the runs:

          java -Xmx32g -Djava.io.tmpdir=Temp/ -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ref.fa -I sample.bam -nct 24 --dbsnp dbsnp138.vcf --genotyping_mode DISCOVERY --minPruning 2 -newQual -stand_call_conf 30 --emitRefConfidence GVCF -variant_index_type LINEAR -variant_index_parameter 128000 -L chr1 -G none -l INFO -log sample.log -o sample_chr1.g.vcf.gz

The sample difference extracted between both the files using the diff command :-

F1 chr1 resemble the line extracted from the .gVCF file generated few months back
F2 chr1 resemble the line extracted from the .gVCF file generated recently

Change 1 observed: DP, PL values different between two output .GVCF files from these two runs

       F1 chr1    1510162    .    A    <NON_REF>    .    .    END=1510162    GT:DP:GQ:MIN_DP:PL    0/0:46:12:46:0,12,1425
       F2 chr1    1510162    .    A    <NON_REF>    .    .    END=1510162    GT:DP:GQ:MIN_DP:PL    0/0:45:9:45:0,9,1380


        F1 chr1    6941045    .    C    <NON_REF>    .    .    END=6941080    GT:DP:GQ:MIN_DP:PL    0/0:14:0:7:0,0,139
        F2 chr1    6941045    .    C    <NON_REF>    .    .    END=6941080    GT:DP:GQ:MIN_DP:PL    0/0:15:0:7:0,0,139


        F1 chr1    45683203    rs34100486    CTTTT    C,<NON_REF>    177.60    .    DB;MLEAC=1,0;MLEAF=0.500,0.00    GT:GQ:PL:SB    0/1:22:185,0,22,188,37,225:1,0,3,2
        F2 chr1    45683203    rs34100486    CTTTT    C,<NON_REF>    168.60    .    DB;MLEAC=1,0;MLEAF=0.500,0.00    GT:GQ:PL:SB    0/1:22:176,0,22,179,37,215:1,0,3,2   

Change 2 observed: 29 variants added in the recent run .gVCF output file which were not in the present in the previous run .gVCF output file
Below are the few sample varaints added to the new run .gVCF output file

        F2 chr1    15357649    .    G    <NON_REF>    .    .    END=15357649    GT:DP:GQ:MIN_DP:PL    0/0:41:94:41:0,94,1235
        F2 chr1    15357650    .    A    <NON_REF>    .    .    END=15357650    GT:DP:GQ:MIN_DP:PL    0/0:39:99:39:0,102,1284 

Change 3 observed: 10 variants present in the previous run .gVCF output file which were not in the present in the recent run .gVCF output file
Below are the few sample varaints present in the previous run .gVCF output file

         F1 chr1    9282514    .    C    CTCCCCCTCCTCCTTGTCTCCTCCTCCCTCTCCCCCT,<NON_REF>    274.01    .    MLEAC=2,0;MLEAF=1.00,0.00    GT:GQ:PL:SB    1/1:20:288,20,0,289,21,290:0,0,0,3
         F1 chr1    9282515    .    T    <NON_REF>    .    .    END=9282515    GT:DP:GQ:MIN_DP:PL    0/0:37:0:37:0,0,820
         F1 chr1    27014608    .    T    <NON_REF>    .    .    END=27014608    GT:DP:GQ:MIN_DP:PL    0/0:35:91:35:0,91,1388** 

Could you please explain why I get different results in two runs of HaplotypeCaller and what this change in values between the two output .gvcf files mean? Can this affect variant calling (Joint genotyping) that will be done at a later stage with all sample together?

Answers

Sign In or Register to comment.