Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Bug with VariantFiltration with missing genotype info column?

Hi there,

I noticed that when I combine Gvcfs from multiple samples and genotype them, some samples that have missing genotypes are denoted by ./.:0,0:0.

1) It looks like VariantFiltration does not like it when it finds ./.:0,0:0. right after GT:AD:DP:GQ:PL column, i.e., when this happens for the first sample it has this problem, but if it happens for 2nd or nth sample, it seems okay..

2) Why are the missing genotypes denoted by ./.:0,0:0 and not ./.:0,0:0:0:0 , the GQ and PL fields dont have corresponding zeros. I am wondering if other tools that take vcf files, dont seem to like them due to this issue.

Thanks,
Deepthi

p.s I am using the latest build v3.4-46-gbc02625

Best Answer

Answers

  • dr153dr153 DukeMember
    edited October 2015

    Yes, I get an error message.

    command :

    INFO 15:22:05,350 HelpFormatter - Program Args: -R /data/davelab/bix/resources/genomes/hg19/ucsc.hg19.fasta -T VariantFiltration --variant /data/davelab/projects/Xenomousie/xeno_mousie.gvcf.list.restricted.genot
    yped.vcf -o /data/davelab/projects/Xenomousie/xeno_mousie.gvcf.list.restricted.genotyped.filtereddpandmq.v2.vcf --filterExpression DP > 10 && MQ > 30 --filterName vcfqual

    The error message is here :

    ERROR MESSAGE: Line 409: there aren't enough columns for line chr (we expected 9 tokens, and saw 1 ), for input source:

    The line with the error is below:

    chr1 990517 . C T 518.51 . AC=12;AF=1.00;AN=12;DP=18;FS=0.000;MLEAC=12;MLEAF=1.00;MQ=60.00;MQ0=0;QD=28.81;SOR=1.609 GT:AD:DP:GQ:PL ./.:0,0:0 ./.:0,0:0 1/1:0,2:2:6:76,6,0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 1/1:0,2:2:6:74,6,0 ./.:0,0:0 1/1:0,2:2:6:49,6,0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 1/1:0,3:3:9:79,9,0 ./.:0,0:0 ./.:0,0 1/1:0,2:2:6:70,6,0 ./.:0,0:0 ./.:0,0:0 ./.:0,0:0 1/1:0,7:7:21:193,21,0 ./.:0,0:0

    Is this because the DP values are zeros or ./.?

    2) Thanks for the vcf format clarification.

  • dr153dr153 DukeMember

    This solved my issue : I reran gatk-genotyper with "-never_trim_vcf_format_field" and then used VariantFiltration tool.. That seemed to work!

  • SheilaSheila Broad InstituteMember, Broadie admin

    @dr153
    Hi,

    Thanks for reporting your solution!

    -Sheila

Sign In or Register to comment.