Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Removing NON_REF tags from VCF

fazulurfazulur hyderabadMember

Dear GATK Team,

I ran HaplotypeCaller in GVCF mode and extracted variants using gvcftools "extract_variants". VCF has "NON_REF" tags which carried from g.vcf.

I want to generate both g.vcf & vcf for each sample. Instead of running haplotypecaller again for getting vcf, I am using gvcftools "extract_variants" to extract variants from existing g.vcf which is faster.

Please suggest me how can i get rid of NON_REF tags in vcf file.

Thanks In Advance
Fazulur Rehaman

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @fazulur

    You can get rid of NON_REF variants using the SelectVariants tool's --exclude-non-variants option.

  • fazulurfazulur hyderabadMember
    edited July 17

    Hi @bhanuGandham,

    Thanks a lot for your quick response.

    I tried SelectVariants --exclude-non-variants option on vcf (extracted from g.vcf using gvcftools extract_variants).

    gatk SelectVariants -R hs37d5.fa -V test.vcf.gz --exclude-non-variants true -O test-non-ref-removed.vcf.gz

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test

    1 10177 rs201752861 A C,<NON_REF> 11.12 . BaseQRankSum=-0.241;DB;DP=14;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.480;RAW_MQ=13286.00;ReadPosRankSum=0.000 GT:AD:DP:GQ:PL:SB 0/1:6,3,0:9:39:39,0,156,57,163,221:4,2,2,1
    1 10583 rs58108140 G A,<NON_REF> 77.77 . BaseQRankSum=-1.150;DB;DP=11;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=1.150;RAW_MQ=24041.00;ReadPosRankSum=-0.703 GT:AD:DP:GQ:PL:SB 0/1:5,3,0:8:99:106,0,172,121,181,302:0,5,0,3

    I tried extracting NON_REF variants from g.vcf using GATK SelectVariants

    gatk SelectVariants -R hs37d5.fa -V test.g.vcf.gz --exclude-non-variants true -O test.vcf.gz

    In this case also I am getting "NON_REF" tags as I specified in above example. "C,<NON_REF>".

    Please suggest me how can I get rid of NON_REF tags from VCF.

    Thanks In Advance
    Fazulur Rehaman

  • fazulurfazulur hyderabadMember

    Hi @bhanuGandham,

    Could you please let me know your suggestions on how to extract variants from g.vcf without "NON_REF" tags?

    Thanks In Advance
    Fazulur Rehaman

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @fazulur

    Try to add these options --select-type-to-include SNP, --select-type-to-exclude NO_VARIATION and --removeUnusedAlternates

Sign In or Register to comment.