Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

IndelRealigner no initialization

seqseekseqseek Posts: 11Member
edited September 2012 in Ask the GATK team

I got this when I ran the IndelRealigner. The output bam is empty.

INFO  15:19:35,568 TraversalEngine - Total runtime 0.00 secs, 0.00 min, 0.00 hours
INFO  15:19:36,910 GATKRunReport - Uploaded run statistics report to AWS S3

It didn't initialize. The sample is aligned to specific region of the genome and I did use -L option. For whole genome alignment of the same sample, I don't have any problems. Do you know why?

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Can you post the full stack trace?

    Geraldine Van der Auwera, PhD

  • seqseekseqseek Posts: 11Member
    edited September 2012
    INFO  15:54:24,747 HelpFormatter - --------------------------------------------------------------------------------- 
    INFO  15:54:24,751 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-39-gd091f72, Compiled 2012/08/10 15:55:35 
    INFO  15:54:24,751 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  15:54:24,752 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  15:54:24,752 HelpFormatter - Program Args: -I test.sorted.bam -R hg19_bundle/ucsc.hg19.fasta -T IndelRealigner -targetIntervals targetcreator.interval_list -o test.out.bam -known hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf -known hg19_bundle/1000G_phase1.indels.hg19.vcf --consensusDeterminationModel KNOWNS_ONLY -L my.interval_list -LOD 0.4 
    INFO  15:54:24,753 HelpFormatter - Date/Time: 2012/09/25 15:54:24 
    INFO  15:54:24,753 HelpFormatter - --------------------------------------------------------------------------------- 
    INFO  15:54:24,753 HelpFormatter - --------------------------------------------------------------------------------- 
    INFO  15:54:24,766 ArgumentTypeDescriptor - Dynamically determined type of hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF 
    INFO  15:54:24,768 ArgumentTypeDescriptor - Dynamically determined type of hg19_bundle/1000G_phase1.indels.hg19.vcf to be VCF 
    INFO  15:54:24,854 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  15:54:24,946 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  15:54:24,968 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
    INFO  15:54:24,988 RMDTrackBuilder - Loading Tribble index from disk for file hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf 
    WARN  15:54:25,080 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer 
    INFO  15:54:25,084 RMDTrackBuilder - Loading Tribble index from disk for file hg19_bundle/1000G_phase1.indels.hg19.vcf 
    WARN  15:54:25,169 VCFHeader - Found GL format, but no PL field.  As the GATK now only manages PL fields internally automatically adding a corresponding PL field to your VCF header 
    WARN  15:54:25,171 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Alternate Allele Count' but standard is 'Allele count in genotypes, for each ALT allele, in the same order as listed' 
    WARN  15:54:25,171 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has INTEGER but standard is A -- descriptions disagree; header has 'Global Allele Frequency based on AC/AN' but standard is 'Allele Frequency, for each ALT allele, in the same order as listed' 
    INFO  15:54:29,685 TraversalEngine - Total runtime 0.00 secs, 0.00 min, 0.00 hours 
    INFO  15:54:30,914 GATKRunReport - Uploaded run statistics report to AWS S3 
    
    Post edited by Geraldine_VdAuwera on
  • seqseekseqseek Posts: 11Member

    The attach file is better formatted.

    txt
    txt
    stack_trace.txt
    3K
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    To post a well-formatted stack trace, just copy-paste it into any good text editor that can indent the entire block for you, indent it, then copy-paste that back into the forum post.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    I see you are passing your intervals as a listfile. Check if it is properly formatted -- the most likely explanation to your problem is that the engine is not finding any valid intervals to run IndelRealigner on.

    Geraldine Van der Auwera, PhD

  • seqseekseqseek Posts: 11Member

    Thanks for your quick reply! The interval that I used is correctly formatted. I used it for all other processes and didn't have any problems. And there are 112 Indels falls in my intervals.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    OK then , to be sure to rule out the file, can you try rerunning the same command but passing an interval directly (like -L 20:10000-20000 ? Or even a simpler interval like -L 20. Just make sure there are reads in it.

    Geraldine Van der Auwera, PhD

  • seqseekseqseek Posts: 11Member
    edited September 2012

    It's the same output.

    INFO  17:40:28,440 HelpFormatter - Program Args: -I test.sorted.bam -R hg19_bundle/ucsc.hg19.fasta -T IndelRealigner -targetIntervals targetcreator.interval_list -o test.out.bam -known hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf -known hg19_bundle/1000G_phase1.indels.hg19.vcf --consensusDeterminationModel KNOWNS_ONLY -L chr1:6000000-7000000 -LOD 0.4 
    INFO  17:40:28,440 HelpFormatter - Date/Time: 2012/09/25 17:40:26 
    INFO  17:40:28,441 HelpFormatter - --------------------------------------------------------------------------------- 
    INFO  17:40:28,441 HelpFormatter - --------------------------------------------------------------------------------- 
    INFO  17:40:28,457 ArgumentTypeDescriptor - Dynamically determined type of hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF 
    INFO  17:40:28,459 ArgumentTypeDescriptor - Dynamically determined type of hg19_bundle/1000G_phase1.indels.hg19.vcf to be VCF 
    INFO  17:40:28,543 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  17:40:28,632 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  17:40:28,653 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
    INFO  17:40:28,672 RMDTrackBuilder - Loading Tribble index from disk for file hg19_bundle/Mills_and_1000G_gold_standard.indels.hg19.vcf 
    WARN  17:40:28,765 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer 
    INFO  17:40:28,769 RMDTrackBuilder - Loading Tribble index from disk for file hg19_bundle/1000G_phase1.indels.hg19.vcf 
    WARN  17:40:28,863 VCFHeader - Found GL format, but no PL field.  As the GATK now only manages PL fields internally automatically adding a corresponding PL field to your VCF header 
    WARN  17:40:28,865 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Alternate Allele Count' but standard is 'Allele count in genotypes, for each ALT allele, in the same order as listed' 
    WARN  17:40:28,865 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has INTEGER but standard is A -- descriptions disagree; header has 'Global Allele Frequency based on AC/AN' but standard is 'Allele Frequency, for each ALT allele, in the same order as listed' 
    INFO  17:40:30,652 TraversalEngine - Total runtime 0.00 secs, 0.00 min, 0.00 hours 
    INFO  17:40:32,634 GATKRunReport - Uploaded run statistics report to AWS S3 
    
    Post edited by seqseek on
  • seqseekseqseek Posts: 11Member

    I figured it out! It's the header file. When I mapped to the restricted region, I kept an simpler version of the header which obviously doesn't work well with the reference dict. I changed the header and it works fine. Thanks for your help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Glad you found a solution to your problem! File headers/dicts were going to be my next suggestion :)

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.