We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CombineVariants and memory problem

ludoduvoludoduvo Member
edited February 2013 in Ask the GATK team

Dear GATK team,

I recently tried to use CombineVariants to merge 800 vcf files together using the following command:

java -Xms512m -Xmx20g -jar GenomeAnalysisTK.jar -R assembly2.fasta -T CombineVariants --assumeIdenticalSamples --suppressCommandLineHeader [--variant file1 ... --variant file800] -o AllSNPs.02_FirstCall20130218.vcf

When I tested the command with only few vcf, it works fine meaning that I have no problem with the command itself. However, when I want to merge my 800 files together, even though I allocate 20 gb of RAM to the java machine (that is close to the maximum I can do), the run fails with the following error (the log file is attached):

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
ERROR ------------------------------------------------------------------------------------------

Each of my 800 files is no more than 1Mb so I find weird that 20 Gb of RAM is not enough (can it be linked with the fact that my reference genome, acyrthosiphon pisum version 2, is still made of 22.000 contigs that appear in the header of each vcf?).
Is there a way to trick around my problem without increasing the total amount of RAM allocated the java machine (for example using an option similar to --read_buffer_size)?

Alternatively I thought to use the two following command lines to merge my files:
cat Job1.Targets1-4.3_UnifiedGenotyper1.vcf | grep "#" > AllSNPs.02_FirstCall.test.vcf
cat Job*.vcf | grep -v "#" >> AllSNPs.02_FirstCall.test.vcf
However this won't produce the associated idx file, is that a problem?

Many thanks in advance,


Best Answer


Sign In or Register to comment.