Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

Combine Variants on numerous vcf

mmajmmaj Member
edited May 2013 in Ask the GATK team

Hi GATK team,

Is there a way to use CombineVariants on multiple vcf files, without retyping the "-V A.vcf -V B.vcf -V C.vcf etc." for each and every vcf file?
we usually have 50-100 vcfs to combine, so this is not a feasible option :)
Your DepthOfCoverage walker has the function I'm looking for, as multiple bams can be passed as a single file holding the names of the bam files to analyze, eg. "-I bamnames.list".
However, a similar function appears to be missing in CombineVariants, which makes this walker very difficult to use in if you have many vcfs to combine.

Thanks for a great toolkit!

Best Answer

Answers

  • mmajmmaj Member
  • EADGEADG KielMember ✭✭✭

    @Geraldine_VdAuwera , @mmaj

    I could be wrong but this feature is still not available. Would be nice to have it some day.

    While waiting you can use this little q&d-bashscript:
    #!/bin/bash if [ $1 ] && [ $2 ] && [ $3 ] then echo "" else echo "Parameter missing" echo "Usage InputList.vcf reference.fa Output.vcf " echo "$1 $2 $3" echo "Exit script" exit fi inputFile1=$1 referencePath=$2 outputName=$3 gatkPath=/../GenomeAnalysisTK.jar echo "#!/bin/bash" > mergeScript echo " java -Xmx12g -jar $gatkPath \\" >> mergeScript echo " -T CombineVariants \\" >> mergeScript echo " -R $referencePath \\" >> mergeScript echo " -o $outputName \\" >> mergeScript while read dataLine do #echo $dataLine name=basename $dataLine .vcf echo " --variant:$name $dataLine \\" >> mergeScript done < $1 echo " -genotypeMergeOptions UNIQUIFY" >> mergeScript chmod 755 mergeScript ./mergeScript rm mergeScript

    => Just add two single quotation markling around basename $dataLine .vcf

  • mzabidimzabidi Member

    This worked for me.

    First, list the vcf files you want to combine:

            vcf_list=$(ls *vcf | while read l; do
              echo "-V "$l
            done)
    

    Then, call CombineVariants

        java -jar GenomeAnalysisTK.jar -T CombineVariants \
        -R hg19.sorted.fa \
        $vcf_list \
        -minN 2 \
        --setKey "null" \
        --filteredAreUncalled \
        --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
        -o normal_PON.vcf
    
  • mzabidimzabidi Member

    or pass the files as -V arguments as an --arg_file option:

    ls *vcf | while read l; do
      echo "-V "$l
    done > PON_arg.list
    
    java -jar GenomeAnalysisTK.jar -T CombineVariants \
    -R hg19.sorted.fa \
    --arg_file PON_arg.list \
    -minN 3 \
    --setKey "null" \
    --filteredAreUncalled \
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
    -o normal_PON.vcf
    
Sign In or Register to comment.