To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Combine Variants on numerous vcf

mmajmmaj Member
edited May 2013 in Ask the GATK team

Hi GATK team,

Is there a way to use CombineVariants on multiple vcf files, without retyping the "-V A.vcf -V B.vcf -V C.vcf etc." for each and every vcf file?
we usually have 50-100 vcfs to combine, so this is not a feasible option :)
Your DepthOfCoverage walker has the function I'm looking for, as multiple bams can be passed as a single file holding the names of the bam files to analyze, eg. "-I bamnames.list".
However, a similar function appears to be missing in CombineVariants, which makes this walker very difficult to use in if you have many vcfs to combine.

Thanks for a great toolkit!

Best Answer

Answers

  • EADGEADG KielMember

    @Geraldine_VdAuwera , @mmaj

    I could be wrong but this feature is still not available. Would be nice to have it some day.

    While waiting you can use this little q&d-bashscript:
    #!/bin/bash if [ $1 ] && [ $2 ] && [ $3 ] then echo "" else echo "Parameter missing" echo "Usage InputList.vcf reference.fa Output.vcf " echo "$1 $2 $3" echo "Exit script" exit fi inputFile1=$1 referencePath=$2 outputName=$3 gatkPath=/../GenomeAnalysisTK.jar echo "#!/bin/bash" > mergeScript echo " java -Xmx12g -jar $gatkPath \\" >> mergeScript echo " -T CombineVariants \\" >> mergeScript echo " -R $referencePath \\" >> mergeScript echo " -o $outputName \\" >> mergeScript while read dataLine do #echo $dataLine name=basename $dataLine .vcf echo " --variant:$name $dataLine \\" >> mergeScript done < $1 echo " -genotypeMergeOptions UNIQUIFY" >> mergeScript chmod 755 mergeScript ./mergeScript rm mergeScript

    => Just add two single quotation markling around basename $dataLine .vcf

  • This worked for me.

    First, list the vcf files you want to combine:

            vcf_list=$(ls *vcf | while read l; do
              echo "-V "$l
            done)
    

    Then, call CombineVariants

        java -jar GenomeAnalysisTK.jar -T CombineVariants \
        -R hg19.sorted.fa \
        $vcf_list \
        -minN 2 \
        --setKey "null" \
        --filteredAreUncalled \
        --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
        -o normal_PON.vcf
    
  • or pass the files as -V arguments as an --arg_file option:

    ls *vcf | while read l; do
      echo "-V "$l
    done > PON_arg.list
    
    java -jar GenomeAnalysisTK.jar -T CombineVariants \
    -R hg19.sorted.fa \
    --arg_file PON_arg.list \
    -minN 3 \
    --setKey "null" \
    --filteredAreUncalled \
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
    -o normal_PON.vcf
    
Sign In or Register to comment.