If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
CombineVariants for over 4000 PON, memory issue
I'm trying to build a panel of normal with ~4000 normal samples from tcga. The version I'm using is
I first tried to use .vcf files directly and let combinevariants make .idx for each of them automatically.
I'm using the following cmd:
cmd = ['java', '-Djava.io.tmpdir=/tmp/job_tmp', '-d64', '-jar', '-Xmx128G', gatk_path, '-nt', str(thread_count), '-T', 'CombineVariants', '-R', reference_fasta_path, '-minN', '2', '--setKey', 'null', '--filteredAreUncalled', '--filteredrecordsmergetype', 'KEEP_IF_ANY_UNFILTERED', '-o', output_vcf] for vcf_path in vcf_files: cmd.extend(['-V', vcf_path])
However, it always failed creating .idx after ~1.5k samples. So I manually ran the cmd couple times to get all .idx files. and then ran the cmd with
Then I got the error
ERROR MESSAGE: java.lang.reflect.InvocationTargetException. After searching the forum, this post ,http://gatkforums.broadinstitute.org/gatk/discussion/6094/error-message-java-lang-reflect-invocationtargetexception, suggests that it might be caused by "bad" .idx files.
So, I used
tabix to re-index all the .vcf.gz files. Fed .vcf.gz and .tbi to gatk, and ran the cmd again. Then I got
##### ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument. ##### ERROR ------------------------------------------------------------------------------------------ INFO 12:44:40,035 ProgressMeter - chr1:7995100 68374.0 13.9 h 8.5 d 0.3% 32.1 w 32.0 w INFO 12:46:29,151 ProgressMeter - chr1:7995100 68374.0 13.9 h 8.5 d 0.3% 32.2 w 32.1 w INFO 12:47:50,165 ProgressMeter - chr1:7995100 68374.0 13.9 h 8.5 d 0.3% 32.2 w 32.2 w INFO 12:49:58,354 ProgressMeter - chr1:7995100 68374.0 14.0 h 8.5 d 0.3% 32.3 w 32.2 w INFO 12:50:58,603 ProgressMeter - chr1:7995100 68374.0 14.0 h 8.5 d 0.3% 32.4 w 32.3 w
I have mainly 3 questions,
1. Did I do anything stupid?
2. How many resources would you suggest to use in my case. Right now it ran in a single vm. I could try it on the cluster level.
3. Is there any different ways to do that? I didn't run them by batches, because of