Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs: no records in VCF

kirill84kirill84 CanadaMember
edited July 2017 in Ask the GATK team

Dear GATK team,

I am having troubles calling genotypes on *.gvcf produced by HaplotypeCaller in GVCF mode.
When I run GenotypeGVCFs (GATK 3.5), I get only header in resulting VCF file, but no records.
I had no such problem before.

Could you advice on possible reason of the issue and how to fix it?

Here is the command and output:

java -Xmx12g  -Djava.io.tmpdir=./tmp -jar GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R reference.fa \
    --variant  sample1.g.vcf \
    --variant  sample2.g.vcf \ 
    --variant  sample3.g.vcf \
    --variant  sample4.g.vcf --variant  sample5.g.vcf --variant  sample6.g.vcf --variant  sample7.g.vcf --variant  sample7.g.vcf \
    --num_threads 4 \
-o TEST.gt.vcf

note: there are SNPs/INDELs in sample*.g.vcf

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
...
...
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8

Thank you!

Post edited by shlee on
Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @kirill84,

    Can you try running the command without the --num_threads threading and see if you still get an empty output? Also, be sure to use the latest release for GenotypeGVCFs--either v3.7 or GATK4-BETA to rule out version specific bugs that may have been subsequently fixed.

Sign In or Register to comment.