Run time error during variant calling

KellyKelly Posts: 3Member
edited March 2013 in Ask the GATK team

Hi, I'm using GATK latest version to analyze paired end exome sequencing data. I'd like to see the SNP, Indel and also SVs. I have followed the workflow of GATK, from the duplicates marking to the reads reducing step. Everything goes fine, until I start to use the HaplogypeCaller walker for the variant calling.
Command line I used:

java -jar $GATK/GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fa -I sample_reduced.bam -o sample_variant.vcf

At the beginning, it worked well, then I got the error message of "Reads are too small for use in assembly."
And I also tried the UnifiedGenotyper walker, command line:

java -jar $GATK/GenomeAnalysisTK.jar -T UnifiedGenotyper -R  human_g1k_v37.fa -I sample_reduced.bam -glm BOTH -o sample_variant.vcf

I got an error message of "Read bases and read insertion quals aren't the same, size 46 vs. 49".
I have googled the error message, but no related result. Does anyone met with the same problem? Eager to know how to solve this.
Thanks!

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,027Administrator, GATK Dev admin

    Your first error is a bug we are aware of and are working to fix. The second sounds like a bam file issue -- have you tried validating your bam?

    Geraldine Van der Auwera, PhD

  • KellyKelly Posts: 3Member

    Thank you for the quick response.
    I tried to call SNP and Indel separately, and the SNP calling works. I'm now waiting for the result of Indel calling.
    java -jar $GATK/GenomeAnalysisTK.jar -T UnifiedGenotyper -R human_g1k_v37.fa -I sample_reduced.bam -o sample_variant.vcf
    java -jar $GATK/GenomeAnalysisTK.jar -T UnifiedGenotyper -R human_g1k_v37.fa -I sample_reduced.bam -glm INDEL -o sample_variant.vcf
    To your question, Yes, I have followed the suggestions from the community to validate the bam file in every step using picard ValidateSamFile. The sam file is ok, but there is a warning "NM tag in the file does not match the reallity" after being cleaned using picard CleanSam, the conversion from sam to bam, using samtools, and fixing the mates.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,027Administrator, GATK Dev admin

    OK, sounds like you're doing all the right things. The NM tag warning is probably not worth worrying about.

    Let me know if the second error persists, and if so we'll look into it.

    Geraldine Van der Auwera, PhD

  • KellyKelly Posts: 3Member

    The second error persists.

  • regreg Posts: 2Member

    I've started to receive that second "Read bases and read insertion quals aren't the same" on a set of my BAMs as well using the 2.4 release

  • regreg Posts: 2Member

    I should add perhaps that my BAMs were processed along with the best practices: (align, dupe marking, indel realign, bqsr, reduce reads all with 2.4). Using the UnifiedGenotyper then gives the bases & quals size error. I'm trying now calling the same BAMs with 2.3-4-g57ea19f and it seems to be running without error

  • andrei_barysenkaandrei_barysenka Posts: 6Member

    I too encouuntered the second error, i.e., "Read bases and read insertion quals aren't the same, size A vs. B"; it happened only when I used latest GATK's UnifiedGenotyper with -glm INDEL or -glm BOTH; earlier versions didn't produce this error.
    After some experimenting I found out that it disappears when I use -DIQ option with PrintReads during the BaseRecalibration step.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,027Administrator, GATK Dev admin

    Thanks Andrei, that is helpful.

    OK folks, it looks like we have a bug here. Can one of you upload a snippet of your bam file so that we can reproduce the error locally? Please see instructions here:

    http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • andrei_barysenkaandrei_barysenka Posts: 6Member

    just uploaded the snippet etc.:

    bugreport_a1ef69e011.tar.gz

  • ebanksebanks Broad InstitutePosts: 687Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin

    Thanks for the bug report. I've implemented a patch that should hopefully roll out to the public later today.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,027Administrator, GATK Dev admin

    FYI this is fixed as of version 2.4-7.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.