Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mismatch between number of variants in the input and output of the genotypeGVCF

I am merging few hundred of samples for a project level VCF. The following summarize my steps:

a) performed a combineGVCF on a set of gVCF (pVCF1) and then a combineGVCF on another set of gVCF (pVCF2)
b) performed the genotypeGVCF on pVCF1 and pVCF2
c) ran VQSR on this genotypeGVCF output.

What I found is there are variants found in output of genotypeGVCF, but not in pVCF1 and pVCF2, and they all pass the variant filters (VQSRTrancheSNP99.80to99.90 or VQSRTrancheSNP99.70to99.80). I am confused why I am getting these results.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Can you please post some example records?
  • yyeeyyee Member

    chr22 11800146 . A G 26.52 . AC=4;AF=0.012;AN=336;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=3174;ExcessHet=3.1446;FS=42.932;InbreedingCoeff=-0.0306;MLEAC=3;MLEAF=8.929e-03;MQ=54.86;MQRankSum=-1.465e+00;QD=0.37;ReadPosRankSum=-1.286e+00;SOR=5.549 GT:AD:DP:GQ:PL 0/0:45,0:45:99:0,114,1800 0/0:33,0:33:90:0,90,1350 0/0:30,0:30:90:0,90,993 ./.:0,0:0:.:0,0,0 0/0:18,0:18:54:0,54,626 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,35 0/0:31,0:31:90:0,90,1108 0/0:26,0:26:72:0,72,1080 0/0:38,0:38:99:0,103,1290 0/0:26,0:26:72:0,72,1080 0/0:2,0:2:0:0,0,10 0/1:7,2:9:29:29,0,196 0/0:19,0:19:22:0,22,591 0/0:21,0:21:39:0,39,585 0/0:17,0:17:37:0,37,522 ./.:0,0:0:.:0,0,0 0/0:19,0:19:54:0,54,810 0/0:33,0:33:93:0,93,1395 0/0:20,0:20:60:0,60,705 ./.:0,0:0:.:0,0,0 0/0:18,0:18:48:0,48,720 0/0:65,0:65:99:0,120,1800 0/0:15,0:15:39:0,39,580/0:11,0:11:33:0,33,362 0/0:30,0:30:55:0,55,833 0/0:21,0:21:60:0,60,690 0/0:11,0:11:33:0,33,362 0/0:15,0:15:45:0,45,471 0/0:17,0:17:16:0,16,451 0/0:2,0:2:6:0,6,70

  • yyeeyyee Member

    Corresponding VQSR output:

    chr22 11800146 . A G 26.52 VQSRTrancheSNP99.80to99.90 AC=4;AF=0.012;AN=336;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=3174;ExcessHet=3.1446;FS=42.932;InbreedingCoeff=-0.0306;MLEAC=3;MLEAF=8.929e-03;MQ=54.86;MQRankSum=-1.465e+00;QD=0.37;ReadPosRankSum=-1.286e+00;SOR=5.549;VQSLOD=-2.072e+01;culprit=SOR GT:AD:DP:GQ:PL 0/0:45,0:45:99:0,114,1800 0/0:33,0:33:90:0,90,1350 0/0:30,0:30:90:0,90,993 ./.:0,0:0:.:0,0,0 0/0:18,0:18:54:0,54,626 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,35 0/0:31,0:31:90:0,90,1108 0/0:26,0:26:72:0,72,1080 0/0:38,0:38:99:0,103,1290 0/0:26,0:26:72:0,72,1080 0/0:2,0:2:0:0,0,10 0/1:7,2:9:29:29,0,196 0/0:19,0:19:22:0,22,591 0/0:21,0:21:39:0,39,585 0/0:17,0:17:37:0,37,522 ./.:0,0:0:.:0,0,0 0/0:19,0:19:54:0,54,810 0/0:33,0:33:93:0,93,1395 0/0:20,0:20:60:0,60,705 ./.:0,0:0:.:0,0,0

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @yyee
    Hi,

    I am confused. Which records are from pVCF1 and pVCF2? Which records are from GenotypeGVCFs? It will be a lot easier for us if you can highlight the inconsistencies between the three outputs.

    Thanks,
    Sheila

Sign In or Register to comment.