Missing variants from vcf to gvcf

Hello,
I work with complete sequences of Y chromosome of NGS. I'm creating a GVCF multisample from 24 single vcfs. Once I created the GVCF multisample, I realize that for 11 samples I'm missing variants. As seen in the example, from column 10 to 20 shouldn't give 0 since it's a variant present in the singles vcfs of those samples. What might be going wrong?

Y 28670117 . T C 9746.79 . AC=12;AF=1.00;AN=12;DP=239;FS=0.000;MLEAC=12;MLEAF=1.00;MQ=59.41;QD=31.70;SOR=0.894 GT:AD:DP:GQ:PL .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 .:0,0 1:0,10:10:99:322,0 1:0,20:20:99:853,0 1:0,22:22:99:916,0 1:0,25:25:99:1023,0 1:0,25:25:99:1041,0 1:0,18:18:99:749,0 1:0,36:36:99:1418,0 1:0,12:12:99:523,0 1:0,30:30:99:310,0 1:0,9:9:99:294,0 1:0,3:3:99:105,0 1:0,28:28:99:1217,0.:0,0

This is the command that I used:

java -jar /home/GATK/GenomeAnalysisTK.jar -R /home/hgref_human_b37_ChrY/human_g1k_v37_decoy.fasta -T GenotypeGVCFs -o S.genotypeGVCF.vcf -allSites --variant sample1.haplotypecallerGVCF.g.vcf --variant sample2.haplotypecallerGVCF.g.vcf --variant allsamples.haplotypecallerGVCF.g.vcf > S.genotypeGVCF.log 2>&1

Thanks!

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chauchino
    Hi,

    Can you post a few records from the GVCF that are not present in the final VCF?

    Thanks,
    Sheila

  • Hi Sheila, thanks for your response.
    I ran the commands again only with the erroneous samples.

    Some present variants in a simple vcf of one of the samples:

    Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1

    Y 22263585 dbsnp.137:rs200028495 C T . PASS . GT 1

    Y 22266595 dbsnp.100:rs2704728 T A . VQLOW . GT 1

    Y 22267120 dbsnp.100:rs2690791 G T . VQLOW . GT 1

    Y 22268472 dbsnp.131:rs75615887 G T . PASS . GT 1

    Variants lost in the gvcf multisample:

    Y 22263573 dbsnp.137:rs199905717 C G, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22263585 dbsnp.137:rs200028495 C T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22266595 dbsnp.100:rs2704728 T A, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22267120 dbsnp.100:rs2690791 G T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

    Y 22268472 dbsnp.131:rs75615887 G T, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 . . . . .

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @chauchino
    Hi,

    Is this Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1 the entire GVCF record? Or is it from a single sample VCF? I am a little confused by your wording, as we produce single-sample GVCFs with HaplotypeCaller in GVCF mode then produce a multi-sample VCF with GenotypeGVCFs.

    Sheila

  • This is a single sample VCF: Y 22263573 dbsnp.137:rs199905717 C G . PASS . GT 1

    This is a GVCF multisample that include the single sample VCF above: Y 22263573 dbsnp.137:rs199905717 C G, . . MLEAC=.;MLEAF=. GT:AD .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0 .:0,0,0

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @chauchino Can you show the single-sample GVCF records for the lines you're concerned about? Also, what version are you using? Have you tried using GATK4?

Sign In or Register to comment.