Release notes for GATK version 3.4

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited November 2015 in Announcements

GATK 3.4 was released on May 15, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.


New tool

  • ASEReadCounter: A tool to count read depth in a way that is appropriate for allele specific expression (ASE) analysis. It counts the number of reads that support the REF allele and the ALT allele, filtering low qual reads and bases and keeping only properly paired reads. See Highlights for more details.

HaplotypeCaller & GenotypeGVCFs

  • Important fix for genotyping positions over spanning deletions. Previously, if a SNP occurred in sample A at a position that was in the middle of a deletion for sample B, sample B would be genotyped as homozygous reference there (but it's NOT reference - there's a deletion). Now, sample B is genotyped as having a symbolic DEL allele. See Highlights for more details.
  • Deprecated --mergeVariantsViaLD argument in HaplotypeCaller since it didn’t work. To merge complex substitutions, use ReadBackedPhasing as a post-processing step.
  • Removed exclusion of MappingQualityZero, SpanningDeletions and TandemRepeatAnnotation from the list of annotators that cannot be annotated by HaplotypeCaller. These annotations are still not recommended for use with HaplotypeCaller, but this is no longer enforced by a hardcoded ban.
  • Clamp the HMM window starting coordinate to 1 instead of 0 (contributed by nsubtil).
  • Fixed the implementation of allowNonUniqueKmersInRef so that it applies to all kmer sizes. This resolves some assembly issues in low-complexity sequence contexts and improves calling sensitivity in those regions.
  • Initialize annotations so that --disableDithering actually works.
  • Automatic selection of indexing strategy based on .g.vcf file extension. See Highlights for more details.
  • Removed normalization of QD based on length for indels. Length-based normalization is now only applied if the annotation is calculated in UnifiedGenotyper.
  • Added the RGQ (Reference GenotypeQuality) FORMAT annotation to monomorphic sites in the VCF output of GenotypeGVCFs. Now, instead of stripping out the GQs for monomorphic ohm-ref sites, we transfer them to the RGQ. This is extremely useful for people who want to know how confident the hom-ref genotype calls are. See Highlights for more details.
  • Removed GenotypeSummaries from default annotations.
  • Added -uniquifySamples to GenotypeGVCFs to make it possible to genotype together two different datasets containing the same sample.
  • Disallow changing -dcov setting for HaplotypeCaller (pending a fix to the downsampling control system) to prevent buggy behavior. See Highlights for more details.
  • Raised per-sample limits on the number of reads in ART and HC. Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data.
  • Removed explicit limitation (20) of the maximum ploidy of the reference-confidence model. Previously there was a fixed-size maximum ploidy indel RCM likelihood cache; this was changed to a dynamically resizable one. There are still some de facto limitations which can be worked around by lowering the max alt alleles parameter.
  • Made GQ of Hom-Ref Blocks in GVCF output be consistent with PLs.
  • Fixed a bug where HC was not realigning against the reference but against the best haplotype for the read.
  • Fixed a bug (in HTSJDK) that was causing GenotypeGVCFs to choke on sites with large numbers of alternate alleles (>140).
  • Modified the way GVCFBlock header lines are named because the new HTSJDK version disallows duplicate header keys (aside from special-cased keys such as INFO and FORMAT).

CombineGVCFs

  • Added option to break blocks at every N sites. Using --breakBandsAtMultiplesOf N will ensure that no reference blocks span across genomic positions that are multiples of N. This is especially important in the case of scatter-gather where you don't want your scatter intervals to start in the middle of blocks (because of a limitation in the way -L works in the GATK for VCF records with the END tag). See Highlights for more details.
  • Fixed a bug that caused the tool to stop processing after the first contig.
  • Fixed a bug where the wrong REF allele was output to the combined gVCF.

VariantRecalibrator

  • Switched VQSR tranches plot ordering rule (ordering is now based on tranche sensitivity instead of novel titv).
  • VQSR VCF header command line now contains annotations and tranche levels.

SelectVariants

  • Added -trim argument to trim (simplify) alleles to a minimal representation.
  • Added -trimAlternates argument to remove all unused alternate alleles from variants. Note that this is pretty aggressive for monomorphic sites.
  • Changed the default behavior to trim (remove) remaining alleles when samples are subset, and added the -noTrim argument to preserve original alleles.
  • Added --keepOriginalDP argument.

VariantAnnotator

  • Improvements to the allele trimming functionalities.
  • Added functionality to support multi-allelic sites when annotating a VCF with annotations from another callset. See Highlights for more details.

CalculateGenotypePosteriors

  • Fixed user-reported bug featuring "trio" family with two children, one parent.
  • Added error handling for genotypes that are called but have no PLs.

Various tools

  • BQSR: Fixed an issue where GATK would skip the entire read if a SNP is entirely contained within a sequencing adapter (contributed by nsubtil); and improved how uncommon platforms (as encoded in RG:PL tag) are handled.
  • DepthOfCoverage: Now logs a warning if incompatible arguments are specified.
  • SplitSamFile: Fixed a bug that caused a NullPointerException.
  • SplitNCigarReads: Fixed issue to make -fixNDN flag fully functional.
  • IndelRealigner: Fixed an issue that was due to reads that have an incorrect CIGAR length.
  • CombineVCFs: Minor change to an error check that was put into 3.3 so that identical samples don't need -genotypeMergeOption.
  • VariantsToBinaryPED: Corrected swap between mother and father in PED file output.
  • GenotypeConcordance: Monomorphic sites in the truth set are no longer called "Mismatching Alleles" when the comp genotype has an alternate allele.
  • ReadBackedPhasing: Fixed a couple of bugs in MNP merging.
  • CatVariants: Now allows different input / output file types, and spaces in directory names.
  • VariantsToTable: Fixed a bug that affected the output of the FORMAT record lists when -SMA is specified. Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines. Otherwise, the entire list is output with each field.

Read Filters

  • Added erroneous CIGAR length to criteria for BadCigarFilter.
  • Corrected logical expression in MateSameStrandFilter (contributed by user seru71).
  • Handle X and = CIGAR operators appropriately
  • Added -drf argument to disable default read filters. Limited to specific tools and specific filters (currently only DuplicateReadFilter).

Annotations

  • Calculate StrandBiasBySample using all alternate alleles as “REF vs. any ALT”.
  • Modified InbreedingCoeff so that it works when genotyping uniquified samples (see GenotypeGVCFs changes).
  • Changed GC Content value type from Integer to Float.
  • Added StrandAlleleCountsBySample annotation. This annotation outputs the number of reads supporting each allele, stratified by sample and read strand; callable from HaplotypeCaller only.
  • Made annotators emit a warning if they can't be applied.

GATK Engine & common features

  • Fixed logging of 'out' command line parameter in VCF headers; changed []-type arrays to lists so argument parsing works in VCF header commandline output.
  • Modified GATK command line header for unique keys. The GATK command line header keys were being repeated in the VCF and subsequently lost to a single key value by HTSJDK. This resolves the issue by appending the name of the walker after the text "GATKCommandLine" and a number after that if the same walker was used more than once in the form: GATKCommandLine.(walker name) for the first occurrence of the walker, and GATKCommandLine.(walker name).# where # is the number of the occurrence of the walker (e.g. GATKCommandLine.SomeWalker.2 for the second occurrence of SomeWalker).
  • Handle X and = CIGAR operators appropriately.
  • Added barebones read/write CRAM support (no interval seeking!). See Highlights for more details.
  • Cleaned up logging outputs / streams; messages (including HMM log messages) that were going to stdout now going to stderr.
  • Improved error messages; when an error is related to a specific file, the engine now includes the file name in the error message.
  • Fixed BCF writing when FORMAT annotations contain arrays.

Queue

  • Added -qsub-broad argument. When -qsub-broad is specified instead of -qsub, Queue will use the h_vmem parameter instead of h_rss to specify memory limit requests. This was done to accommodate changes to the Broad’s internal job scheduler. Also causes the GridEngine native arguments to be output by default to the logger, instead of only when in debug mode.
  • Fixed the scala wrapper for Picard MarkDuplicates (needed because MarkDuplicates was moved to a different package within Picard).
  • Added optional element "includeUnmapped" to the PartitionBy annotation. The value of this element (default true) determines whether Queue will explicitly run this walker over unmapped reads. This patch fixes a runtime error when FindCoveredIntervals was used with Queue.

Documentation

  • Plentiful enhancements and fixes to various tool docs, especially annotations and read filters.

For developers

  • Upgraded SLF4J to allow new convenient logging syntaxes.
  • Patched maven pom file for slf4j-log4j12 version (contributed by user Biocyberman).
  • Updated HTSJDK version (now pulling it in from Maven Central); various edits made to match.
  • Collected VCF IDs and header lines into one place (GATKVCFConstants).
  • Made various changes that lead to reduced build times.
Post edited by Geraldine_VdAuwera on

Issue · Github
by Geraldine_VdAuwera

Issue Number
995
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Comments

  • CardiffBioinfCardiffBioinf CardiffMember

    Hi

    Thanks for another release of GATK.

    Using HaplotypeCaller v.3.4.0 i'm unable to load the VectorLoglessPairHMM library.

    DEBUG 11:21:10,064 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from GATK jar file WARN 11:21:10,068 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING

    My full command is
    /usr/java/jdk1.7.0_51/bin/java -Xmx4g -jar /share/apps/GATK-distros/GATK_3.4.0/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R /data/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ --dbsnp /data/db/human/gatk/2.8/b37/dbsnp_138.b37.vcf \ -I "$RunID"_"$Sample_ID".bam \ -L "$BEDFilename" \ -o "$RunID"_"$Sample_ID".g.vcf \ --genotyping_mode DISCOVERY \ -stand_emit_conf 10 \ -stand_call_conf 30 \ --emitRefConfidence GVCF \ -dt NONE

    Running
    jar tf GATK_3.4.0/GenomeAnalysisTK.jar | grep libVector

    Gives
    org/broadinstitute/gatk/utils/pairhmm/libVectorLoglessPairHMM.so

    Any ideas?

    Thanks
    Matt

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @CardiffBioinf ,

    Is this a new problem with 3.4? Can you confirm (with log snippet) that when running 3.3 you were able to use the native library?

  • CardiffBioinfCardiffBioinf CardiffMember

    Hi Geraldine

    Thanks for the reply. It looks like a problem with the new version.

    Here's the output using GATK v3.3.0

    DEBUG 15:18:34,311 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from GATK jar file Using AVX accelerated implementation of PairHMM INFO 15:18:34,330 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file INFO 15:18:34,330 VectorLoglessPairHMM - Using vectorized implementation of PairHMM

    I used this command

    /usr/java/jdk1.7.0_51/bin/java -Xmx4g -jar /share/apps/GATK-distros/GATK_3.3.0/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -l DEBUG \ -R /data/db/human/gatk/2.8/b37/human_g1k_v37.fasta \ --dbsnp /data/db/human/gatk/2.8/b37/dbsnp_138.b37.vcf \ -I "$RunID"_"$Sample_ID".bam \ -L "$BEDFilename" \ -o "$RunID"_"$Sample_ID".g.vcf \ --genotyping_mode DISCOVERY \ -stand_emit_conf 10 \ -stand_call_conf 30 \ --emitRefConfidence GVCF \ --variant_index_type LINEAR \ --variant_index_parameter 128000 \ -dt NONE

    Best wishes
    Matt

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    OK, looks like we might have a bug in 3.4 then. We'll get that fixed asap.

    @tommycarstensen this is related to one of the warnings you reported...

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hey @CardiffBioinf and @tommycarstensen, it looks like this is not actually a bug, but a gcc library version compatibility issue. From what I understand of the devs' explanation (this is a little out of my usual domain) some updates to our build environment led to the library we ship being incompatible with older versions of gcc libs. Can you please try updating your local gcc lib version and try running again? If that works it'll confirm we just need to update our requirements docs for the native libraries. On the bright side, no code change needed.

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    Alas that's not really as reasonable a request as it might first sound when your Institute is running Ubuntu Precise and is unlikely to change in the near future. I think we might be better off recompiling from source unless there is a dependancy on a specific function in the latest glibc. You might want to include some instructions on doing this and editing the .jar file to include it as a second option.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @TechnicalVault Indeed, we now realize that it's not trivial, so we're working on a solution. Right now our best-looking options are to either issue a patch version that walks back the gcc version to the oldest viable, or to provide the library compiled with an older version, as a separate download, as there is already some functionality to load an external version of the PairHMM lib. We'll probably go for the patch option since it's the easiest for regular users and we have another change we'd like to make as a patch anyway. Any thoughts about that?

  • TechnicalVaultTechnicalVault Cambridge, UKMember ✭✭✭

    I'd say patch is probably the easiest for us, the changed version means we have a way of checking it's been applied.

  • FerFer AustriaMember

    Hi GATK team,
    has this patch version been released already? Is it in a nightly build for instance?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Fer Not yet, sorry. There are a few other things we want to fix up in the same patch. I'm hoping we'll have it out by the end of the week.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    FYI the PairHMM fix just went into the master code, so it will be available in the nightly build tomorrow.

  • omid1854omid1854 helsinkiMember

    Thanks for addressing this Issue. It is a quick solution for a person who has no root access.
    I am using GATK version 3.4-0-g7e26428; two main issues:

    1) I assume the problem regarding the "gcc library version incompatibility" is addressed by now, and the patch version have been implemented in to the version of the GATK which i am using. Should i be worry about the warning that i get still? If I should can you guid please?!
    WARN 11:03:07,305 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING

    2.a) In this link it says (http://gatkforums.broadinstitute.org/discussion/5578/gatk-haplotypecaller3-4-warnings) 3.4 GATK version automatically recognizes variant index and parameters, but if i dont provide --variant_index_type and --variant_index_parameter values It throws an error!
    ##### ERROR MESSAGE: GVCF output requires a specific indexing strategy. Please re-run including the arguments -variant_index_type LINEAR -variant_index_parameter 128000.

    2.b) I also get warning for annotations!
    WARN 12:13:15,438 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper
    WARN 12:13:15,439 InbreedingCoeff - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.

    Issue · Github
    by Sheila

    Issue Number
    1043
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @omid1854
    Hi,

    1) No need to worry about that warning. Even I get the same warning too. As Geraldine said in the post you referenced "it's just adapting to your infrastructure."

    2a) Can you post the exact command you are running? The main things that you need to have are -ERC GVCF and your -o output file needs to have a .g.vcf extension.

    2b) No need to worry about those warnings either. I get them myself too. I suspect they are default calculations in Haplotype Caller, however, they probably should not be. I will put in a ticket to remove them.

    -Sheila

  • omid1854omid1854 helsinkiMember

    Thank for very quick response and clarifications.
    I used the latest nightly build and the warning issue regarding the "gcc lib incompatibility" has been solved but i get now new Error messages as the following, which i could not see among similar posts, but i tried to follow the admin's recommendations to solve it, and i was not successful yet.

    INFO 21:22:22,746 ProgressMeter - Contig0:336282 0.0 30.0 s 49.6 w 0.1% 10.8 h 10.8 h
    INFO 21:22:52,748 ProgressMeter - Contig0:718352 0.0 60.0 s 99.2 w 0.2% 10.1 h 10.1 h
    INFO 21:23:12,128 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
    at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
    at org.broadinstitute.gatk.utils.activeregion.ActiveRegion.getReference(ActiveRegion.java:220)
    at org.broadinstitute.gatk.utils.activeregion.ActiveRegion.getActiveRegionReference(ActiveRegion.java:186)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.assembleReads(HaplotypeCaller.java:978)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:824)
    at org.broadinstitute.gatk.tools.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:226)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:709)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$TraverseActiveRegionMap.apply(TraverseActiveRegions.java:705)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:274)
    at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version nightly-2015-07-08-g6e00632):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ----------------------------------------------------------

    My java version is:

    java version "1.7.0_79"
    OpenJDK Runtime Environment (rhel-2.5.5.3.el6_6-x86_64 u79-b14)
    OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

    My command line is like this:
    java -Xmx50g -jar GenomeAnalysisTK.jar
    -R ref.fa
    -T HaplotypeCaller
    -I sampl1.RG.bam \
    -ERC GVCF \
    --output_mode EMIT_VARIANTS_ONLY \
    --min_base_quality_score 20 \
    --genotyping_mode DISCOVERY \
    --variant_index_type LINEAR \
    --variant_index_parameter 128000 \
    -o sampl1.RG.g.vcf

    So it does not matter what input .bam i used, before or after recalibration files, i still get that "ERROR stack trace". But, i can use the same .bam files for -T UnifiedGenotyper easily, and no problems. Is it going back to sensitivity issue of the HaplotypeCaller, or some thing else?

    What i am missing here? i appreciate any help.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @omid1854
    Hi,

    Can you post the bam header and fasta dict file?

    Thanks,
    Sheila

    P.S. You can leave out
    --output_mode EMIT_VARIANTS_ONLY \
    --min_base_quality_score 20 \
    --genotyping_mode DISCOVERY \
    --variant_index_type LINEAR \
    --variant_index_parameter 128000 \
    from your command.

  • omid1854omid1854 helsinkiMember
    edited July 2015

    Thanks for the efforts.
    Leaving out --variant_index_type LINEAR and --variant_index_parameter 128000 throws an Error, and HaplotypeCaller is asking them back.

    I have attached the both bam header and fasta dict files.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @omid1854
    Hi,

    When I run version 3.4 and its nightly builds, I do not need to add those arguments. Are you running GATK from your own personal computer or from somewhere else? Honestly, I am not sure what is going on , so I may need you to submit a bug report. If you can, instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @omid1854 Note that the auto-handling depends on naming your output file *.g.vcf; and while that wasn't working for *.gz outputs in 3.4, that is fixed in 3.4-46.

  • streetcatstreetcat Member

    Hi,

    I'm getting an exception when running SplitNCigarReads, as part of an RNASeq pipeline
    (following this guide: https://www.broadinstitute.org/gatk/guide/article?id=3891)

    GATK version:
    v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12

    My java version is:
    java version "1.7.0_80"
    Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

    The command and output
    java -jar $GATK -T SplitNCigarReads -R Pepper.v.1.5.total.chr.fasta -I dedupped.bam -o split.bam -rf Reas
    signOneMappingQuality -U ALLOW_N_CIGAR_READS

    INFO 18:57:58,607 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 18:57:58,609 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 18:57:58,610 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 18:57:58,610 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 18:57:58,613 HelpFormatter - Program Args: -T SplitNCigarReads -R Pepper.v.1.5.total.chr.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -U ALLOW_N_CIGAR_READS
    INFO 18:57:58,619 HelpFormatter - Executing as [email protected] on Linux 3.19.0-21-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_80-b15.
    INFO 18:57:58,619 HelpFormatter - Date/Time: 2015/07/19 18:57:58
    INFO 18:57:58,620 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 18:57:58,620 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 18:57:59,083 GenomeAnalysisEngine - Strictness is SILENT
    INFO 18:57:59,191 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO 18:57:59,198 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 18:57:59,224 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
    INFO 18:57:59,296 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 18:57:59,300 GenomeAnalysisEngine - Done preparing for traversal
    INFO 18:57:59,301 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 18:57:59,301 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 18:57:59,301 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
    INFO 18:57:59,314 ReadShardBalancer$1 - Loading BAM index data
    INFO 18:57:59,315 ReadShardBalancer$1 - Done loading BAM index data
    INFO 18:58:29,577 ProgressMeter - Pepper.v.1.5.chr01:22219805 500053.0 30.0 s 60.0 s 0.8% 59.3 m 58.8 m
    INFO 18:58:59,579 ProgressMeter - Pepper.v.1.5.chr01:93169942 1000060.0 60.0 s 60.0 s 3.5% 28.3 m 27.3 m
    INFO 18:59:29,580 ProgressMeter - Pepper.v.1.5.chr01:162874638 1600254.0 90.0 s 56.0 s 6.2% 24.3 m 22.8 m
    INFO 19:00:00,668 ProgressMeter - Pepper.v.1.5.chr01:217759093 2200330.0 2.0 m 55.0 s 8.3% 24.4 m 22.4 m
    INFO 18:58:29,577 ProgressMeter - Pepper.v.1.5.chr01:22219805 500053.0 30.0 s 60.0 s 0.8% 59.3 m 58.8 m
    INFO 19:00:30,948 ProgressMeter - Pepper.v.1.5.chr02:121 2866290.0 2.5 m 52.0 s 9.9% 25.3 m 22.8 m
    INFO 19:00:31,823 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
    at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.OverhangFixingManager$Splice.initialize(OverhangFixingManager.java:365)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.OverhangFixingManager.addSplicePosition(OverhangFixingManager.java:171)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.SplitNCigarReads.splitReadBasedOnCigar(SplitNCigarReads.java:280)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.SplitNCigarReads.splitNCigarRead(SplitNCigarReads.java:233)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.SplitNCigarReads.reduce(SplitNCigarReads.java:210)
    at org.broadinstitute.gatk.tools.walkers.rnaseq.SplitNCigarReads.reduce(SplitNCigarReads.java:118)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:251)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:240)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    Any help would be much appreciated.

    Raviv.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited July 2015

    @streetcat
    Hi Raviv,

    Can you try deleting your reference index file and reindexing? If that doesn't work, I may need you to submit a bug report.

    Thanks,
    Sheila

  • streetcatstreetcat Member

    Hi Shelia,

    I'll try that.
    Does it matter how I generate the index?
    I used "samtools faidx" on my fasta file.

    Thanks for the quick reply.
    Raviv.

  • streetcatstreetcat Member

    @Shelia

    I tried re-indexing the reference but it didn't help. I'm getting this error when running SplitNCigarReads on other samples as well.
    How do I submit a bug?
    Could it be that I'm not indexing the reference correctly? the resulting file seems legit.

    Thanks,
    Raviv.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @streetcat
    Hi Raviv,

    A few other users have submitted bug reports for this as well. I will use those for test cases. Please have a look at this thread for more information: http://gatkforums.broadinstitute.org/discussion/5868/splitncigarreads-error-message-code-exception#latest
    I will post there when I have submitted a bug report and when the bug is fixed.

    Thanks,
    Sheila

  • FredericFrederic Texas Biomedical Research InstituteMember

    Hi all,

    I got the same kind of error when using the RealignerTargetCreator module with GATK 3.4-46-gbc02625. All goes perfectly well when using 3.3-0-g37228af (and by the way thanks to have set up an archive repository).

    My reads are aligned using bwa-mem module of bwa 0.7.12-r1039. I updated my indexes for my reference genome as mentioned here but did not change anything.

    Here are the two java machine tested:

    java version "1.7.0_25"
    Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

    and

    java version "1.8.0_51"
    Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
    Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

    Here is the command used:
    java -Xmx4g -jar $HOME/local/bin/GenomeAnalysisTK.jar -R data/sm_genome/sma_v5.0.chr.fa -I data/Sm.SN_Nd6.2.3/Sm.SN_Nd6.2.3_sorted.bam -T RealignerTargetCreator -o data/Sm.SN_Nd6.2.3/Sm.SN_Nd6.2.3.intervals

    Here are the info header and the stack trace:

    INFO 12:18:59,363 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:18:59,367 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 12:18:59,367 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 12:18:59,367 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 12:18:59,373 HelpFormatter - Program Args: -R data/sm_genome/sma_v5.0.chr.fa -I data/Sm.SN_Nd6.2.3/Sm.SN_Nd6.2.3_sorted.bam -T RealignerTargetCreator -o data/Sm.SN_Nd6.2.3/Sm.SN_Nd6.2.3.intervals
    INFO 12:18:59,381 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.11.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16.
    INFO 12:18:59,381 HelpFormatter - Date/Time: 2015/07/29 12:18:59
    INFO 12:18:59,382 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:18:59,382 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 12:19:00,138 GenomeAnalysisEngine - Strictness is SILENT
    INFO 12:19:00,364 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 12:19:00,374 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 12:19:00,477 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.09
    INFO 12:19:01,088 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 12:19:01,220 GenomeAnalysisEngine - Done preparing for traversal
    INFO 12:19:01,220 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 12:19:01,221 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 12:19:01,221 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 12:19:31,228 ProgressMeter - Schisto_mansoni.Chr_1:999341 983040.0 30.0 s 30.0 s 0.3% 3.0 h 3.0 h

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
    at htsjdk.samtools.reference.IndexedFastaSequenceFile.getSubsequenceAt(IndexedFastaSequenceFile.java:195)
    at org.broadinstitute.gatk.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:329)
    at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.initializeReferenceSequence(LocusReferenceView.java:150)
    at org.broadinstitute.gatk.engine.datasources.providers.LocusReferenceView.(LocusReferenceView.java:126)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:90)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------

    Fred

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Frederic
    Hi,

    Thanks. I will put in a bug report today. It does seem like a bug introduced in the lateset version of GATK. Please follow this thread for updates: http://gatkforums.broadinstitute.org/discussion/5875/gatk-indelrealigner-error#latest

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This was a bug in the htsjdk library. It has been fixed (see here), but we're waiting on the next htsjdk release to incorporate the fix into GATK. Apologies for the inconvenience.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @streetcat @Frederic
    Hi,

    The bug has been resolved in the latest nightly. https://www.broadinstitute.org/gatk/download/nightly

    -Sheila

Sign In or Register to comment.