Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants by sample names file

I need to subset a list of samples from a large vcf.gz file. The sample names was saved in a plain txt file, each name in a row. I used

-RF -sf my.sample.names.txt

but kept getting error.
Any suggestions? Thanks!
Tagged:

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ysgz7

    please post the version of gatk you are using, the exact command and the entire error log.

  • ysgz7ysgz7 Member
    GATK version: 4.1.2.0
    my command:

    ./gatk SelectVariants -V myfile.vcf.gz -O myoutput.vcf -RF -sf my.sample.names.txt

    ***********************************************************************

    A USER ERROR has occurred: Invalid argument 'my.sample.names.txt'.

    ***********************************************************************
    Thank you!
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 14

    @ysgz7

    You need to use -sn as the argument for sample names as opposed to -sf

  • ysgz7ysgz7 Member
    It doesn't work either. As my understanding, followed by -sn should be the name(s), but I'm using a txt file with about 100 names. Am I right?

    I was reading the table here software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php
    --sample_file
    -sf NA File containing a list of samples to include

    Thank you!
  • cnormancnorman United StatesMember, Broadie, Dev ✭✭

    @ysgz7 You're running GATK v4.1.2.0, but referencing the GATK v3.8.0 documentation. If you want to use a filename with SelectVariants -sn in v4.1.2.0, the filename needs to end with a .args suffix. See this for more detail.

  • ysgz7ysgz7 Member
    > @cnorman said:
    > @ysgz7 You're running GATK v4.1.2.0, but referencing the GATK v3.8.0 documentation. If you want to use a filename with SelectVariants -sn in v4.1.2.0, the filename needs to end with a .args suffix. See this for more detail.

    Thank you so much for your reply!

    Now I changed my .txt file extension to my.sample.names.args, put this file and the myfile.vcf.gz under the gatk folder, used the following command
    ./gatk SelectVariants -V myfile.vcf.gz -O myoutput.vcf -RF -sn my.sample.names

    But I still get the error message:
    ***********************************************************************

    A USER ERROR has occurred: Invalid argument 'pan.sample.names'.

    ***********************************************************************

    What sounds like incorrect?

    Thank you!
Sign In or Register to comment.