To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Mismatch between number of variants in the input and output of the genotypeGVCF

I am merging few hundred of samples for a project level VCF. The following summarize my steps:

a) performed a combineGVCF on a set of gVCF (pVCF1) and then a combineGVCF on another set of gVCF (pVCF2)
b) performed the genotypeGVCF on pVCF1 and pVCF2
c) ran VQSR on this genotypeGVCF output.

What I found is there are variants found in output of genotypeGVCF, but not in pVCF1 and pVCF2, and they all pass the variant filters (VQSRTrancheSNP99.80to99.90 or VQSRTrancheSNP99.70to99.80). I am confused why I am getting these results.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Can you please post some example records?
  • chr22 11800146 . A G 26.52 . AC=4;AF=0.012;AN=336;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=3174;ExcessHet=3.1446;FS=42.932;InbreedingCoeff=-0.0306;MLEAC=3;MLEAF=8.929e-03;MQ=54.86;MQRankSum=-1.465e+00;QD=0.37;ReadPosRankSum=-1.286e+00;SOR=5.549 GT:AD:DP:GQ:PL 0/0:45,0:45:99:0,114,1800 0/0:33,0:33:90:0,90,1350 0/0:30,0:30:90:0,90,993 ./.:0,0:0:.:0,0,0 0/0:18,0:18:54:0,54,626 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,35 0/0:31,0:31:90:0,90,1108 0/0:26,0:26:72:0,72,1080 0/0:38,0:38:99:0,103,1290 0/0:26,0:26:72:0,72,1080 0/0:2,0:2:0:0,0,10 0/1:7,2:9:29:29,0,196 0/0:19,0:19:22:0,22,591 0/0:21,0:21:39:0,39,585 0/0:17,0:17:37:0,37,522 ./.:0,0:0:.:0,0,0 0/0:19,0:19:54:0,54,810 0/0:33,0:33:93:0,93,1395 0/0:20,0:20:60:0,60,705 ./.:0,0:0:.:0,0,0 0/0:18,0:18:48:0,48,720 0/0:65,0:65:99:0,120,1800 0/0:15,0:15:39:0,39,580/0:11,0:11:33:0,33,362 0/0:30,0:30:55:0,55,833 0/0:21,0:21:60:0,60,690 0/0:11,0:11:33:0,33,362 0/0:15,0:15:45:0,45,471 0/0:17,0:17:16:0,16,451 0/0:2,0:2:6:0,6,70

  • Corresponding VQSR output:

    chr22 11800146 . A G 26.52 VQSRTrancheSNP99.80to99.90 AC=4;AF=0.012;AN=336;BaseQRankSum=0.712;ClippingRankSum=0.00;DP=3174;ExcessHet=3.1446;FS=42.932;InbreedingCoeff=-0.0306;MLEAC=3;MLEAF=8.929e-03;MQ=54.86;MQRankSum=-1.465e+00;QD=0.37;ReadPosRankSum=-1.286e+00;SOR=5.549;VQSLOD=-2.072e+01;culprit=SOR GT:AD:DP:GQ:PL 0/0:45,0:45:99:0,114,1800 0/0:33,0:33:90:0,90,1350 0/0:30,0:30:90:0,90,993 ./.:0,0:0:.:0,0,0 0/0:18,0:18:54:0,54,626 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,35 0/0:31,0:31:90:0,90,1108 0/0:26,0:26:72:0,72,1080 0/0:38,0:38:99:0,103,1290 0/0:26,0:26:72:0,72,1080 0/0:2,0:2:0:0,0,10 0/1:7,2:9:29:29,0,196 0/0:19,0:19:22:0,22,591 0/0:21,0:21:39:0,39,585 0/0:17,0:17:37:0,37,522 ./.:0,0:0:.:0,0,0 0/0:19,0:19:54:0,54,810 0/0:33,0:33:93:0,93,1395 0/0:20,0:20:60:0,60,705 ./.:0,0:0:.:0,0,0

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @yyee
    Hi,

    I am confused. Which records are from pVCF1 and pVCF2? Which records are from GenotypeGVCFs? It will be a lot easier for us if you can highlight the inconsistencies between the three outputs.

    Thanks,
    Sheila

Sign In or Register to comment.