To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCFs: no records in VCF

kirill84kirill84 CanadaMember
edited July 2017 in Ask the GATK team

Dear GATK team,

I am having troubles calling genotypes on *.gvcf produced by HaplotypeCaller in GVCF mode.
When I run GenotypeGVCFs (GATK 3.5), I get only header in resulting VCF file, but no records.
I had no such problem before.

Could you advice on possible reason of the issue and how to fix it?

Here is the command and output:

java -Xmx12g  -Djava.io.tmpdir=./tmp -jar GenomeAnalysisTK.jar \
    -T GenotypeGVCFs \
    -R reference.fa \
    --variant  sample1.g.vcf \
    --variant  sample2.g.vcf \ 
    --variant  sample3.g.vcf \
    --variant  sample4.g.vcf --variant  sample5.g.vcf --variant  sample6.g.vcf --variant  sample7.g.vcf --variant  sample7.g.vcf \
    --num_threads 4 \
-o TEST.gt.vcf

note: there are SNPs/INDELs in sample*.g.vcf

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
...
...
...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8

Thank you!

Post edited by shlee on
Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @kirill84,

    Can you try running the command without the --num_threads threading and see if you still get an empty output? Also, be sure to use the latest release for GenotypeGVCFs--either v3.7 or GATK4-BETA to rule out version specific bugs that may have been subsequently fixed.

Sign In or Register to comment.