The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

GATK 2.6.2 Exceptions with HaplotypeCaller and -nct

Member

When trying to run the HaplotypeCaller in 2.6.2 with -nct I'm getting a number of crashes. Is NCT currently supported or is this experimental for the HaplotypeCaller currently? With the Multithreading I'm not exactly sure where the error is occurring and it's a pretty big bam. If needed I can try to narrow it down a bit further and create a subset bam...


../jre1.7.0_25/bin/java -jar ../GenomeAnalysisTK-2.6-2-ge03a5e9/GenomeAnalysisTK.jar -R ../refs/bosTau6.lic.fa -T HaplotypeCaller -I ../Chr15.ir.bam -bamout Chr15.bam -o Chr15.vcf.gz -L Chr15 -nct 5 -rf BadCigar

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.NullPointerException
at net.sf.samtools.SAMRecordCoordinateComparator.fileOrderCompare(SAMRecordCoordinateComparator.java:82)
at net.sf.samtools.SAMRecordCoordinateComparator.compare(SAMRecordCoordinateComparator.java:43)
at net.sf.samtools.SAMRecordCoordinateComparator.compare(SAMRecordCoordinateComparator.java:41)
at java.util.TimSort.countRunAndMakeAscending(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.6-2-ge03a5e9):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------



../jre1.7.0_25/bin/java -jar ../GenomeAnalysisTK-2.6-2-ge03a5e9/GenomeAnalysisTK.jar -R ../refs/bosTau6.lic.fa -T HaplotypeCaller -I ../Chr15.ir.bam -bamout Chr15.bam -o Chr15.vcf.gz -L Chr15 -nct 16 -rf BadCigar

#### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeLo(Unknown Source)
at java.util.TimSort.mergeAt(Unknown Source)
at java.util.TimSort.mergeCollapse(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:708) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:704)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$ReadMapReduceJob.run(NanoScheduler.java:471) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.6-2-ge03a5e9):
##### ERROR
##### ERROR Please check the documentation guide to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR
##### ERROR MESSAGE: Comparison method violates its general contract!
##### ERROR ------------------------------------------------------------------------------------------


Tagged:

• Member

Note these may be due to a bug with the handling of -BAMOUT with NCT. If I remove -bamout the job appears to continue running with out exceptions.

Hmm, I'm not sure -- let me pass this on to the team.

• Member

Any chance of the two options being made compatible in the future? The bamout option was very useful for seeing what exactly was happening with the indels. While NCT simplifies getting the HC to run a decent rate with out having to deal with 10000+ subfiles that all need to be merged.

Both together would be ideal.

I'm curious how you are using bamout? It's not efficiently implemented -- its really more of a debugging tool -- so running even without multiple threads I suspect that bamout must be slowing down the caller. Is that not your experience? Or does it just not matter, given that you can more easily understand what the HC is doing? Would some other type of output work better?

It's entirely possible to make the bamout option work with multiple threads. It's just a bit complex, since the reads could be coming out of order. I'll throw it in JIRA

• Member

Hi Mark
I've been using bamout to help our more biologically orientated staff to see what exactly the HC has done with the reads when making the call. The last validation step they do is simply to check the bam for the population to make sure it makes sense based on the Genotype supplied by GATK.

It allows them to have a look at bigger indels where the input bam is ambiguous as well as look at regions where there are multiple indels and check for compensatory mutations ( ie two frameshifts canceling each other out) as well as other error types that VQSR and other filtering have problems dealing with.

Also if your discussing the possible impact of an indel it's nice to be able to stick up a clear image from IGV showing the Indel and where it fits in the Genome and what is near by.

• Member

Hi Geraldine,
I am trying to run the HaplotypeCaller on version GenomeAnalysisTK-2.4-9.
I used the commands below:
java -Xmx6g -jar /home//GenomeAnalysisTK.jar -T HaplotypeCaller -nct 5 -R equcab2.fa -I /home/cleaned.sorted.bam1 -I /home/cleaned.sorted.bam2 -I /home/cleaned.sorted.bam3 -stand_call_conf 20 -stand_emit_conf 10.0 -o output.raw.snps.indels.vcf

When I run the command, I receive the following error:

INFO 14:12:21,883 HelpFormatter - Date/Time: 2013/08/21 14:12:21
INFO 14:12:21,883 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:12:21,883 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:12:21,952 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:12:22,057 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO 14:12:22,064 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:12:22,091 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
INFO 14:12:22,115 MicroScheduler - Running the GATK in parallel mode with 5 total threads, 5 CPU thread(s) for each of 1 data thread(s), of 8 processors available on this machine
INFO 14:12:22,739 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------

`
I am sure the documentation says that the HaplotypeCaller does support parallel execution as in "http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HaplotypeCaller.html"

What do you think the problem may be?

Thank you