Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Missing variants from vcf to gvcf

Hello,
I work with complete sequences of Y chromosome of NGS. I'm creating a GVCF multisample from 24 single vcfs. Once I created the GVCF multisample, I realize that for 11 samples I'm missing variants. As seen in the example, from column 10 to 20 shouldn't give 0 since it's a variant present in the singles vcfs of those samples. What might be going wrong?

Y 28670117 . T C 9746.79 . AC=12;AF=1.00;AN=12;DP=239;FS=0.000;MLEAC=12;MLEAF=1.00;MQ=59.41;QD=31.70;SOR=0.894 GT:AD:DP:GQ:PL .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 1:0,10:10:99:322,0 1:0,20:20:99:853,0 1:0,22:22:99:916,0 1:0,25:25:99:1023,0 1:0,25:25:99:1041,0 1:0,18:18:99:749,0 1:0,36:36:99:1418,0 1:0,12:12:99:523,0 1:0,30:30:99:310,0 1:0,9:9:99:294,0 1:0,3:3:99:105,0 1:0,28:28:99:1217,0.:0,0

This is the command that I used:

java -jar /home/GATK/GenomeAnalysisTK.jar -R /home/hgref_human_b37_ChrY/human_g1k_v37_decoy.fasta -T GenotypeGVCFs -o S.genotypeGVCF.vcf -allSites --variant sample1.haplotypecallerGVCF.g.vcf --variant sample2.haplotypecallerGVCF.g.vcf --variant allsamples.haplotypecallerGVCF.g.vcf > S.genotypeGVCF.log 2>&1

Thanks!

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chauchino
    Hi,

    Can you post a few records from the GVCF that are not present in the final VCF?

    Thanks,
    Sheila

  • Hi Sheila, thanks for your response.
    I ran the commands again only with the erroneous samples.

    Some present variants in a simple vcf of one of the samples:

    Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1

    Y 22263585 dbsnp.137:rs200028495 C T . PASS . GT 1

    Y 22266595 dbsnp.100:rs2704728 T A . VQLOW . GT 1

    Y 22267120 dbsnp.100:rs2690791 G T . VQLOW . GT 1

    Y 22268472 dbsnp.131:rs75615887 G T . PASS . GT 1

    Variants lost in the gvcf multisample:

    Y 22263573 dbsnp.137:rs199905717 C G, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22263585 dbsnp.137:rs200028495 C T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22266595 dbsnp.100:rs2704728 T A, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22267120 dbsnp.100:rs2690791 G T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22268472 dbsnp.131:rs75615887 G T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chauchino
    Hi,

    Is this Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1 the entire GVCF record? Or is it from a single sample VCF? I am a little confused by your wording, as we produce single-sample GVCFs with HaplotypeCaller in GVCF mode then produce a multi-sample VCF with GenotypeGVCFs.

    Sheila

  • This is a single sample VCF: Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1

    This is a GVCF multisample that include the single sample VCF above: Y 22263573 dbsnp.137:rs199905717 C G, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @chauchino Can you show the single-sample GVCF records for the lines you're concerned about? Also, what version are you using? Have you tried using GATK4?

Sign In or Register to comment.