Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Combine Variants on numerous vcf

mmajmmaj Member
edited May 2013 in Ask the GATK team

Hi GATK team,

Is there a way to use CombineVariants on multiple vcf files, without retyping the "-V A.vcf -V B.vcf -V C.vcf etc." for each and every vcf file?
we usually have 50-100 vcfs to combine, so this is not a feasible option :)
Your DepthOfCoverage walker has the function I'm looking for, as multiple bams can be passed as a single file holding the names of the bam files to analyze, eg. "-I bamnames.list".
However, a similar function appears to be missing in CombineVariants, which makes this walker very difficult to use in if you have many vcfs to combine.

Thanks for a great toolkit!

Best Answer

Answers

  • mmajmmaj Member
  • EADGEADG KielMember ✭✭✭

    @Geraldine_VdAuwera , @mmaj

    I could be wrong but this feature is still not available. Would be nice to have it some day.

    While waiting you can use this little q&d-bashscript:
    #!/bin/bash if [ $1 ] && [ $2 ] && [ $3 ] then echo "" else echo "Parameter missing" echo "Usage InputList.vcf reference.fa Output.vcf " echo "$1 $2 $3" echo "Exit script" exit fi inputFile1=$1 referencePath=$2 outputName=$3 gatkPath=/../GenomeAnalysisTK.jar echo "#!/bin/bash" > mergeScript echo " java -Xmx12g -jar $gatkPath \\" >> mergeScript echo " -T CombineVariants \\" >> mergeScript echo " -R $referencePath \\" >> mergeScript echo " -o $outputName \\" >> mergeScript while read dataLine do #echo $dataLine name=basename $dataLine .vcf echo " --variant:$name $dataLine \\" >> mergeScript done < $1 echo " -genotypeMergeOptions UNIQUIFY" >> mergeScript chmod 755 mergeScript ./mergeScript rm mergeScript

    => Just add two single quotation markling around basename $dataLine .vcf

  • mzabidimzabidi Member

    This worked for me.

    First, list the vcf files you want to combine:

            vcf_list=$(ls *vcf | while read l; do
              echo "-V "$l
            done)
    

    Then, call CombineVariants

        java -jar GenomeAnalysisTK.jar -T CombineVariants \
        -R hg19.sorted.fa \
        $vcf_list \
        -minN 2 \
        --setKey "null" \
        --filteredAreUncalled \
        --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
        -o normal_PON.vcf
    
  • mzabidimzabidi Member

    or pass the files as -V arguments as an --arg_file option:

    ls *vcf | while read l; do
      echo "-V "$l
    done > PON_arg.list
    
    java -jar GenomeAnalysisTK.jar -T CombineVariants \
    -R hg19.sorted.fa \
    --arg_file PON_arg.list \
    -minN 3 \
    --setKey "null" \
    --filteredAreUncalled \
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
    -o normal_PON.vcf
    
Sign In or Register to comment.