Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Difference between . and ./. for missing genotype in VCF created by GATK4 GenotypeGVCFs

What is the difference between . and ./. for a missing genotypes in a VCF file? For example in one VCF record I have these two sample genotypes.

GT:AD:DP:GQ:MMQ:PGT:PID:PL .:0,0:.:.:.:.:.:. ./.:0,0:0:.:.:.:.:0,0,0

The VCF record was produced with a recent version of GATK4 via GenotypeGVCFs.
The input GVCFs were created with GATK4 HaplotypeCaller.

The input samples are diploid.
There is no sex chromosome.
So it does not make sense to me that that . is for a haploid genotype and ./. for a diploid genotype.

Is there any meaning behind the different . and ./. genotypes? Or is this just somewhat random and do . and ./. mean the same?

Answers

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @WimS I was able to find that the "./." is used to demarcate that a genotype could not be determined due to a lack of confidence in any one single genotype. This can occur, for example, when there are insufficient reads for an individual sample. The VCF documentation shows that the "." is used to indicate the same for a haploid sample whereas the "./." is used for a diploid. Can you confirm that you set ploidy when running your analyses? If so, I will need to confirm with the team why one might see a haploid designation when you ran diploid samples.

  • WimSWimS Member ✭✭

    Hi @SChaluvadi Thank you for looking into this.

    Here are example HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs commands that were run. I took them from 1 of the created and used sample GVCF files and the commands log file of bcbio.

    Many of these commands were on the multiple samples and reference genome regions. Sample ploidy was set for the haplotype caller via --sample-ploidy 2

    ## HaplotypeCaller
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --annotation MappingQualityRankSumTest --annotation MappingQualityZero --annotation QualByDepth --annotation ReadPosRankSumTest 
    --annotation RMSMappingQuality --annotation BaseQualityRankSumTest --annotation FisherStrand --annotation MappingQuality --annotation DepthPerAlleleBySample --annotation Coverage 
    --annotation ClippingRankSumTest --annotation DepthPerSampleHC --emit-ref-confidence GVCF --gvcf-gq-bands 10 --gvcf-gq-bands 20 --gvcf-gq-bands 30 --gvcf-gq-bands 40 --gvcf-gq-bands 60 
    --gvcf-gq-bands 80 --output work/bcbiotx/tmpOW4zMB/DA_1143_09-S01000184_0_293089.vcf.gz --intervals work/gatk-haplotype/S01000184/DA_1143_09-S01000184_0_293089-regions.bed
     --interval-set-rule INTERSECTION --input work/align/DA_1143_09/DA_1143_09-sort.bam --reference /seq/reference.fa  --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation 
    --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-trim-active-regions false 
    --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false 
    --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false 
    --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20
    --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE 
    --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false 
    --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false 
    --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 
    --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 
    --genotyping-mode DISCOVERY --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 
    --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-padding 0 
    --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false 
    --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true 
    --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false 
    --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20"
    ,Version=4.0.1.1,Date="September 14, 2018 12:05:12 PM CEST">
    
    
    ## GenomicsDBImport 
    [2018-09-17T20:41Z] unset JAVA_HOME && export PATH=/Tools/bcbio/1.0.8/anaconda/bin:$PATH && gatk-launch 
    --java-options '-Xms500m -Xmx28665m -XX:+UseSerialGC -Djava.io.tmpdir=work/joint/gatk-haplotype-joint/DA_1261/S01000184/bcbiotx/tmpIwISlQ' 
    GenomicsDBImport --reader-threads 10 --genomicsdb-workspace-path DA_1261-S01000184_294440_414585_genomicsdb -L S01000184:294441-414585 
     --variant work/gatk-haplotype/DA_1143_09.vcf.gz etc. 
    
    
    ## GenotypeGVCFs 
    [2018-09-17T20:42Z] unset JAVA_HOME && export PATH=/Tools/bcbio/1.0.8/anaconda/bin:$PATH && gatk-launch 
    --java-options '-Xms500m -Xmx28665m -XX:+UseSerialGC -Djava.io.tmpdir=work/bcbiotx/tmp_feqtv' GenotypeGVCFs -new-qual 
    --variant gendb://work/joint/gatk-haplotype-joint/DA_1261/S01000184/DA_1261-S01000184_294440_414585_genomicsdb 
    -R /seq/reference.fa 
    --output work/bcbiotx/tmpXPWZ1_/DA_1261-S01000184_294440_414585.vcf.gz -L S01000184:294441-414585
    
  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @WimS I double checked with the GATK team and they confirmed that the "./." and "." denote lack of evidence to designate genotype at that location for diploid and haploid samples respectively. Since you set --sample-ploidy 2, the "./." is expected but the "." could be an edge case bug in gatk. You can see that there is no evidence at those sites - the unfiltered allele depth (AD) is 0.

    Would you be willing to share your VCF files before and after the call to GenotypeGVCFs for our investigation?

  • WimSWimS Member ✭✭

    Yes, I'll try to find some public data were we also see this issue.
    Then I'll send you a self contained example that demonstrates the issue from the GVCF files to squared VCF.

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @WimS Great thank you!

Sign In or Register to comment.