CombineGVCFs looses Genotypes

dbeckerdbecker MunichMember ✭✭

Hi,

I want to merge the g.vcf files I get from HaplotypeCaller using CombineGVCFs. When I do that, the called genotypes vanish. The g.vcf of one sample before merging:

NC_000001       13273   .       G       C,<NON_REF>     1712.77 .       
BaseQRankSum=0.663;ClippingRankSum=0.000;DP=140;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.797;RAW_MQ=216005.00;ReadPosRankSum=-0.178     
GT:AD:DP:GQ:PL:SB      
 0/1:67,73,0:140:99:1741,0,1495,1941,1714,3655:32,35,38,35

and the same variant in the merged file (It's the second sample):

NC_000001       13273   .       G       C,<NON_REF>     .       .      
BaseQRankSum=0.663;ClippingRankSum=0.00;DP=361;ExcessHet=3.01;MQRankSum=-1.797e+00;RAW_MQ=394720.00;ReadPosRankSum=-1.780e-01   
GT:AD:DP:GQ:MIN_DP:PL:SB        
./.:.:70:99:38:0,114,1190,114,1190,1190 
./.:67,73,0:140:99:.:1741,0,1495,1941,1714,3655:32,35,38,35    
./.:.:61:99:35:0,105,1028,105,1028,1028
./.:0,37,0:37:99:.:1183,111,0,1183,111,1183:0,0,23,14   
./.:0,75,0:75:99:.:2109,223,0,2109,223,2109:0,0,33,42  
./.:.:0:0:0:0,0,0,0,0,0 ./.:.:54:99:35:0,102,1268,102,1268,1268

The GATK commandline is:

/opt/gatk/4.0.0.0/gatk --java-options -Xmx32G CombineGVCFs
-R GRCh38_latest_genomic_final.fa
-V 17450281-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17380470-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17470830-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17470788-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370765-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370767-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370768-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-O cohort.g.vcf

When I run ValidateVariants on the merged file I get the following Error:

A USER ERROR has occurred: 
Input /srv/nfs/ngsdata/GATK/171218_NS500396_0299_AHYV7NBGX3/_gatk/cohort.g.vcf 
fails strict validation: one or more of the ALT allele(s) for the record at position NC_000001:13273 are not observed at all in the sample genotypes of type

Any ideas?

Thanks and best regards,
Daniel

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @dbecker,

    Genotype with GenotypeGVCFs. Given your small number of samples, you can skip CombineGVCFs and directly genotype all of your sample gvcfs with GenotypeGVCFs.

  • dbeckerdbecker MunichMember ✭✭

    Hi,

    so it is normal, that the called genotypes vanish? Why is ValidateVariants giving me an error then?

    Also we have a much larger number of samples, but I always merge my current run and then merge it to our global cohort. Is this a bad approach?

    Best,
    Daniel

  • dbeckerdbecker MunichMember ✭✭

    Hi,

    I'm actually on the GATK workshop in Montreal at the moment and therefore have no access to my data.
    Every command I used is GATK4.0, but I'm pretty sure I missed the --validate-GVCF option. So I'll try that when I'm back in the office in April.

    Best,
    Daniel

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @dbecker, I hope you are enjoying the workshop.

  • dbeckerdbecker MunichMember ✭✭

    Hi,
    I don't get the Error anymore using the --validate-GVCF option. Thanks! Now I have a new Problem, but I already found a bug report. https://github.com/broadinstitute/gatk/issues/4525. I'm looking forward to a fix to that.

    Best,
    Daniel

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @dbecker
    Hi Daniel,

    The fix should be in soon. A developer is working on it currently.

    -Sheila

Sign In or Register to comment.