Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

multiple vcf input to SelectVariants

agsmagsm SwedenMember

I am trying to use the GATK (3.6) SelectVariants tool, and I want to input several vcf files. Preferably all as --variant (-V). The key here is that these have to be separate files, not one multi sample VCF. Supposedly SelectVariants can take a list of vcf files, but it is unclear (to me) how. I tried -V input1.vcf -V input2.vcf ... and -V input1.vcf input2.vcf ... and both throw an error. So how exactly can I provide a list of vcf files? Alternatively, which tool to use to select variants from multiple vcf files (I do not want to use the --concordant option because in some cases I want to select variants present in a fraction of input files, and I do not care much in which specific file given variant shows up). I'm grateful for any hints! Thanks!

Tagged:

Best Answer

Answers

  • agsmagsm SwedenMember

    Hi Sheila,
    Thanks for your quick reply. I have found some scattered posts (here and on Biostars) suggesting that SelectVariants can take a list of VCF files, so I wanted to give it a go. Initially, I did not want to combine variants into one file as it's apparently not a good practice (http://gatkforums.broadinstitute.org/gatk/discussion/53/combining-variants-from-different-files-into-one).
    I'll combine the vcfs, as it seems the only way to do what I want to do (if I want to stay within the GATK framework), but just to be sure: after combining the vcf files, the annotations on the sample level (FORMAT) can be used for filtering. What about the variant level (INFO)?
    Thanks a lot for the clarification!
    A

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @agsm
    Hi A,

    Indeed we do not recommend combining VCFs. If you would like to analyze samples together, you should use the GVCF workflow. Have a look at this article.
    This article should convince you to use the GVCF workflow :smile:

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    To clarify, whether "combining vcfs" is a good idea or not depends entirely on what they contain and how they relate to each other. There are many possible case figures, and for some it's fine (some of our variant comparison tools require it) while for others it's a bad idea (to assemble a cohort from individual callsets).

  • agsmagsm SwedenMember

    Hi Sheila and Gerladine,

    Thanks for your comments. I work with RNA-seq data, and did not want to use a workflow which is not fully recommended (gVCF). It's not my own data, so I prefer to stick to your best practice guidelines. I think for my question though it's legit to combine the VCFs for downstream analysis, as I am only interested in the presence / absence of genotype calls in different samples (biological replicates of an experimental treatment).

    /Agata

Sign In or Register to comment.