Combine Variants on numerous vcf

mmajmmaj Member
edited May 2013 in Ask the GATK team

Hi GATK team,

Is there a way to use CombineVariants on multiple vcf files, without retyping the "-V A.vcf -V B.vcf -V C.vcf etc." for each and every vcf file?
we usually have 50-100 vcfs to combine, so this is not a feasible option :)
Your DepthOfCoverage walker has the function I'm looking for, as multiple bams can be passed as a single file holding the names of the bam files to analyze, eg. "-I bamnames.list".
However, a similar function appears to be missing in CombineVariants, which makes this walker very difficult to use in if you have many vcfs to combine.

Thanks for a great toolkit!

Best Answer

Answers

  • EADGEADG KielMember

    @Geraldine_VdAuwera , @mmaj

    I could be wrong but this feature is still not available. Would be nice to have it some day.

    While waiting you can use this little q&d-bashscript:
    #!/bin/bash if [ $1 ] && [ $2 ] && [ $3 ] then echo "" else echo "Parameter missing" echo "Usage InputList.vcf reference.fa Output.vcf " echo "$1 $2 $3" echo "Exit script" exit fi inputFile1=$1 referencePath=$2 outputName=$3 gatkPath=/../GenomeAnalysisTK.jar echo "#!/bin/bash" > mergeScript echo " java -Xmx12g -jar $gatkPath \\" >> mergeScript echo " -T CombineVariants \\" >> mergeScript echo " -R $referencePath \\" >> mergeScript echo " -o $outputName \\" >> mergeScript while read dataLine do #echo $dataLine name=basename $dataLine .vcf echo " --variant:$name $dataLine \\" >> mergeScript done < $1 echo " -genotypeMergeOptions UNIQUIFY" >> mergeScript chmod 755 mergeScript ./mergeScript rm mergeScript

    => Just add two single quotation markling around basename $dataLine .vcf

  • This worked for me.

    First, list the vcf files you want to combine:

            vcf_list=$(ls *vcf | while read l; do
              echo "-V "$l
            done)
    

    Then, call CombineVariants

        java -jar GenomeAnalysisTK.jar -T CombineVariants \
        -R hg19.sorted.fa \
        $vcf_list \
        -minN 2 \
        --setKey "null" \
        --filteredAreUncalled \
        --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
        -o normal_PON.vcf
    
  • or pass the files as -V arguments as an --arg_file option:

    ls *vcf | while read l; do
      echo "-V "$l
    done > PON_arg.list
    
    java -jar GenomeAnalysisTK.jar -T CombineVariants \
    -R hg19.sorted.fa \
    --arg_file PON_arg.list \
    -minN 3 \
    --setKey "null" \
    --filteredAreUncalled \
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
    -o normal_PON.vcf
    
Sign In or Register to comment.