Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CombineGVCFs looses Genotypes

dbeckerdbecker MunichMember ✭✭

Hi,

I want to merge the g.vcf files I get from HaplotypeCaller using CombineGVCFs. When I do that, the called genotypes vanish. The g.vcf of one sample before merging:

NC_000001       13273   .       G       C,<NON_REF>     1712.77 .       
BaseQRankSum=0.663;ClippingRankSum=0.000;DP=140;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-1.797;RAW_MQ=216005.00;ReadPosRankSum=-0.178     
GT:AD:DP:GQ:PL:SB      
 0/1:67,73,0:140:99:1741,0,1495,1941,1714,3655:32,35,38,35

and the same variant in the merged file (It's the second sample):

NC_000001       13273   .       G       C,<NON_REF>     .       .      
BaseQRankSum=0.663;ClippingRankSum=0.00;DP=361;ExcessHet=3.01;MQRankSum=-1.797e+00;RAW_MQ=394720.00;ReadPosRankSum=-1.780e-01   
GT:AD:DP:GQ:MIN_DP:PL:SB        
./.:.:70:99:38:0,114,1190,114,1190,1190 
./.:67,73,0:140:99:.:1741,0,1495,1941,1714,3655:32,35,38,35    
./.:.:61:99:35:0,105,1028,105,1028,1028
./.:0,37,0:37:99:.:1183,111,0,1183,111,1183:0,0,23,14   
./.:0,75,0:75:99:.:2109,223,0,2109,223,2109:0,0,33,42  
./.:.:0:0:0:0,0,0,0,0,0 ./.:.:54:99:35:0,102,1268,102,1268,1268

The GATK commandline is:

/opt/gatk/4.0.0.0/gatk --java-options -Xmx32G CombineGVCFs
-R GRCh38_latest_genomic_final.fa
-V 17450281-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17380470-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17470830-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17470788-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370765-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370767-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-V 17370768-WholeExome-171218_NS500396_0299_AHYV7NBGX3_raw_variants.g.vcf
-O cohort.g.vcf

When I run ValidateVariants on the merged file I get the following Error:

A USER ERROR has occurred: 
Input /srv/nfs/ngsdata/GATK/171218_NS500396_0299_AHYV7NBGX3/_gatk/cohort.g.vcf 
fails strict validation: one or more of the ALT allele(s) for the record at position NC_000001:13273 are not observed at all in the sample genotypes of type

Any ideas?

Thanks and best regards,
Daniel

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @dbecker,

    Genotype with GenotypeGVCFs. Given your small number of samples, you can skip CombineGVCFs and directly genotype all of your sample gvcfs with GenotypeGVCFs.

  • dbeckerdbecker MunichMember ✭✭

    Hi,

    so it is normal, that the called genotypes vanish? Why is ValidateVariants giving me an error then?

    Also we have a much larger number of samples, but I always merge my current run and then merge it to our global cohort. Is this a bad approach?

    Best,
    Daniel

  • dbeckerdbecker MunichMember ✭✭

    Hi,

    I'm actually on the GATK workshop in Montreal at the moment and therefore have no access to my data.
    Every command I used is GATK4.0, but I'm pretty sure I missed the --validate-GVCF option. So I'll try that when I'm back in the office in April.

    Best,
    Daniel

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @dbecker, I hope you are enjoying the workshop.

  • dbeckerdbecker MunichMember ✭✭

    Hi,
    I don't get the Error anymore using the --validate-GVCF option. Thanks! Now I have a new Problem, but I already found a bug report. https://github.com/broadinstitute/gatk/issues/4525. I'm looking forward to a fix to that.

    Best,
    Daniel

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @dbecker
    Hi Daniel,

    The fix should be in soon. A developer is working on it currently.

    -Sheila

Sign In or Register to comment.