The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

Base Recalibrator Issue

AshuAshu Member
edited November 2012 in Ask the GATK team

Hi
I got the following error with GenomeAnalysisTK-2.2-2-gf44cc4e's Base Recalibrator.

##### ERROR MESSAGE: Key 2006 is too large for dimension 2 (max is 2001)

I also ran the picard's validateSamFile to validate my BAM file and it says NO ERRORs.
What exactly does this error mean? what key is it talking about? And how can I fix it?
Thanks,
Ashu

Post edited by Geraldine_VdAuwera on

Answers

  • droazendroazen Cambridge, MADev

    Hi Ashu,

    Could you please post the full command line you're using to run the BaseRecalibrator, and describe any previous processing steps in your pipeline involving the BAM file you're using as input (eg., did you run it through ReduceReads first?)

    David

  • AshuAshu Member
    edited November 2012

    Hi David,

    Here's the command I used to run BaseRecalibrator.

    java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I aligned_reads.bam -R reference.fasta -knownSites dbsnp.vcf -o recal.grp
    

    And, I did not know if I have to run it through ReduceReads? I have not used that before. The older versions of GATK were able to process my alignment files without using "ReduceReads".

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    No Ashu, you don't need to run ReduceReads. That is a new tool that can be used after processing, just before calling variants. I think David was asking because if you had used ReduceReads before BaseRecalibrator, that might explain your problem.

    So, did you use any other tools before running BaseRecalibrator? Like maybe IndelRealigner? Telling David all the steps you performed on your dataset will help him find out what the problem might be...

  • Dear David/Geraldine,

    I got these alignment(BAM) files(for a bacteria) from PacificBiosciences Secondary Analysis Software. They come with a mapping quality of 255. With the older version of GATK, I used to change the mapping quality to 60 using PrintReads and then do a quality recalibration on the alignment files and then call snps on these recalibrated files. The whole process worked fine. Now, I wanted to repeat this process with the new version of GATK as it says you can call indels with the new Unified Genotyper.

    Ashu

  • droazendroazen Cambridge, MADev
    edited November 2012

    Hi Ashu,

    I believe your problem is with the CycleCovariate, which is very platform-dependent in the way it works. To test this hypothesis, could you please try re-running the BaseRecalibrator without the CycleCovariate and report back as to whether or not you get the same error? You can do this by adding the following options to your command line:

    --no_standard_covs -cov ContextCovariate

  • It worked, thank you so much. I wanted to ask you if running without Cycle Covariate would change things for me from the way they were working before with the old version of GATK.

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hi Ashu,

    This will be fixed in the next version of the GATK (2.3). For now you should run BQSR without the cycle covariate.

    That being said, you will have a very (very) tough time calling indels from PacBio data. The most significant error mode for PacBio is their high indel error rates and it makes accurate indel calling nearly impossible.

  • Ok...that means, no version of GATK can call indels for Pacbio as of now? Could you may be give me a timeline as to when would the new version that could call indels be released?

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    That's not exactly what it means. To be specific:

    • The GATK can call indels in PacBio data.
    • However, no version of any tool in the world can call indels in PacBio data well.
    • We have no intention of supporting development of indel calling in PacBio data.
  • Yes I know no other tool can call indels well from PacBio data.... I was hoping this new version could...but thanks for the help.
    I really appreciate it.

    Ashu

Sign In or Register to comment.