The current GATK version is 3.2-2

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

BaseRecalibration: Recalibration table is empty

Baylor HealthPosts: 3Member

When running GATK, I am getting "empty" results when running BaseRecalibrator. I didn't see a solution to this when searching.

java -Xmx4g -jar /seqprg/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar -l INFO -R /Users/bcantarel/projects/refdb/human_g1k_v37.fasta --knownSites /Users/bcantarel/projects/refdb/00-All.vcf -I Sample_cDNA405.bam -T BaseRecalibrator -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o Sample_cDNA405.grp

ERROR ------------------------------------------------------------------------------------------

I have used RealignerTargetCreator and IndelRealigner without any issues have gotten the correct output I need. But for some reason at the BaseRecalibrator step I am getting this error. If someone could please help me troubleshoot this.

Thanks, Sinan

Geraldine Van der Auwera, PhD

• Posts: 17Member

@Geraldine_VdAuwera It is working properly now.

Thanks, Sinan

• Posts: 17Member
edited May 2013

Hello again, for some reason when running BaseRecalibration I am getting zero processed reads which is quite interesting. Do you have any idea as why this is occuring, also I do get an output with zero recalibration information.

Here is my command: java -Xmx8g -jar /home/sir2013/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -I 1024_D_realignedBam.bam -R /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/star/hg19.fa -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf --validation_strictness STRICT -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o recal_data.grp

running screen:
INFO  15:10:40,075 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:10:40,077 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02
INFO  15:10:40,077 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  15:10:40,082 HelpFormatter - Program Args: -T BaseRecalibrator -I 1024_D_realignedBam.bam -R /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/star/hg19.fa -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate -o recal_data.grp
INFO  15:10:40,082 HelpFormatter - Date/Time: 2013/05/14 15:10:40
INFO  15:10:40,082 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:10:40,082 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:10:40,104 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf to be VCF
INFO  15:10:40,115 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF
INFO  15:10:40,127 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf to be VCF
INFO  15:10:42,750 GenomeAnalysisEngine - Strictness is SILENT
INFO  15:10:43,025 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO  15:10:43,033 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 15:10:43,051 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO  15:10:43,093 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf
INFO  15:10:43,407 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf
INFO  15:10:44,262 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf
INFO  15:10:44,588 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
INFO  15:10:44,599 GenomeAnalysisEngine - Done creating shard strategy
INFO  15:10:44,600 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  15:10:44,730 BaseRecalibrator - The covariates being used here:
INFO  15:10:44,731 BaseRecalibrator -  QualityScoreCovariate
INFO  15:10:44,732 BaseRecalibrator -  ContextCovariate
INFO  15:10:44,732 ContextCovariate -   Context sizes: base substitution model 2, indel substitution model 3
INFO  15:10:44,733 BaseRecalibrator -  CycleCovariate
INFO  15:10:44,738 ReadShardBalancer$1 - Loading BAM index data for next contig INFO 15:10:44,741 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO  15:11:14,658 ProgressMeter -        Starting        0.00e+00   30.0 s       49.7 w    100.0%        30.0 s     0.0 s
INFO  15:11:44,662 ProgressMeter -        Starting        0.00e+00   60.0 s       99.3 w    100.0%        60.0 s     0.0 s
INFO  15:12:15,185 ProgressMeter -        Starting        0.00e+00   90.0 s      149.8 w    100.0%        90.0 s     0.0 s
INFO  15:12:45,186 ProgressMeter -        Starting        0.00e+00  120.0 s      199.4 w    100.0%       120.0 s     0.0 s
INFO  15:13:15,189 ProgressMeter -        Starting        0.00e+00    2.5 m      249.0 w    100.0%         2.5 m     0.0 s
INFO  15:13:45,191 ProgressMeter -        Starting        0.00e+00    3.0 m      298.6 w    100.0%         3.0 m     0.0 s
INFO  15:14:15,193 ProgressMeter -        Starting        0.00e+00    3.5 m      348.2 w    100.0%         3.5 m     0.0 s
INFO  15:14:44,687 ReadShardBalancer1 - Loading BAM index data for next contig INFO 15:14:44,692 BaseRecalibrator - Calculating quantized quality scores... INFO 15:14:45,195 ProgressMeter - Starting 0.00e+00 4.0 m 397.8 w 100.0% 4.0 m 0.0 s INFO 15:14:45,576 BaseRecalibrator - Writing recalibration report... INFO 15:14:46,197 BaseRecalibrator - ...done! **INFO 15:14:46,200 BaseRecalibrator - Processed: 0 reads** INFO 15:14:46,209 ProgressMeter - done 0.00e+00 4.0 m 399.5 w 100.0% 4.0 m 0.0 s INFO 15:14:46,216 ProgressMeter - Total runtime 241.62 secs, 4.03 min, 0.07 hours INFO 15:14:47,683 GATKRunReport - Uploaded run statistics report to AWS S3 and output file information in it: :GATKReport.v1.1:5 :GATKTable:2:18:%s:%s:; :GATKTable:Arguments:Recalibration argument collection values used in this run Argument Value binary_tag_name null covariate ReadGroupCovariate,QualityScoreCovariate,ContextCovariate,CycleCovariate default_platform null deletions_default_quality 45 force_platform null indels_context_size 3 insertions_default_quality 45 low_quality_tail 2 maximum_cycle_value 500 mismatches_context_size 2 mismatches_default_quality -1 no_standard_covs false plot_pdf_file null quantizing_levels 16 recalibration_report null run_without_dbsnp false solid_nocall_strategy THROW_EXCEPTION solid_recal_mode SET_Q_ZERO :GATKTable:3:94:%s:%s:%s:; :GATKTable:Quantized:Quality quantization map QualityScore Count QuantizedScore 0 0 93 1 0 93 2 0 93 3 0 93 4 0 93 5 0 93 6 0 93 7 0 93 8 0 93 9 0 93 10 0 93 11 0 93 12 0 93 13 0 93 14 0 93 15 0 93 16 0 93 17 0 93 18 0 93 19 0 93 20 0 93 21 0 93 22 0 93 23 0 93 24 0 93 25 0 93 26 0 93 27 0 93 28 0 93 29 0 93 30 0 93 31 0 93 32 0 93 33 0 93 34 0 93 35 0 93 36 0 93 37 0 93 38 0 93 39 0 93 40 0 93 41 0 93 42 0 93 43 0 93 44 0 93 45 0 93 46 0 93 47 0 93 48 0 93 49 0 93 50 0 93 51 0 93 52 0 93 53 0 93 54 0 93 55 0 93 56 0 93 57 0 93 58 0 93 59 0 93 60 0 93 61 0 93 62 0 93 63 0 93 64 0 93 65 0 93 66 0 93 67 0 93 68 0 93 69 0 93 70 0 93 71 0 93 72 0 93 73 0 93 74 0 93 75 0 93 76 0 93 77 0 93 78 0 93 79 0 79 80 0 80 81 0 81 82 0 82 83 0 83 84 0 84 85 0 85 86 0 86 87 0 87 88 0 88 89 0 89 90 0 90 91 0 91 92 0 92 93 0 93 :GATKTable:6:0:%s:%s:%.4f:%.4f:%d:%.2f:; :GATKTable:RecalTable0: ReadGroup EventType EmpiricalQuality EstimatedQReported Observations Errors :GATKTable:6:0:%s:%s:%s:%.4f:%d:%.2f:; :GATKTable:RecalTable1: ReadGroup QualityScore EventType EmpiricalQuality Observations Errors :GATKTable:8:0:%s:%s:%s:%s:%s:%.4f:%d:%.2f:; :GATKTable:RecalTable2: ReadGroup QualityScore CovariateValue CovariateName EventType EmpiricalQuality Observations Errors I do apologize for the long post in the forum. I just dont understand why no Errors are being given as well and no recalibration is being processed. Thanks, Sinan Post edited by Mark_DePristo on • Posts: 274Administrator, GATK Developer admin can you check if your BAM file has any reads? Sounds silly but it could be something as simple as that. Also you don't need to specify the -cov parameters. Those are the default covariates and if you specify them like that, I am afraid it may be confusing the tool. Can you remove those parameters and check if it works? (I'll issue a bug report if that's the case) • Posts: 17Member I know the bam file is not empty because for the IndelRealigner process I had to have the quality scores fixed by using -fixMisencodedQuals and I cross checked it with original bam file to see if the scores were actually adjusted accordingly. Unfortunately I got the same output Command Line Code: java -Xmx8g -jar /home/sir2013/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -I 1024_D_realignedBam.bam -R /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/star/hg19.fa -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf -o recal_data.grp Running Script Output: INFO 11:20:21,953 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:20:21,964 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 INFO 11:20:21,965 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:20:21,965 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 11:20:21,978 HelpFormatter - Program Args: -T BaseRecalibrator -I 1024_D_realignedBam.bam -R /pbtech_mounts/fdlab_store003/fdlab/genomes/human/hg19/indexes/star/hg19.fa -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf -knownSites /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf -o recal_data.grp INFO 11:20:21,979 HelpFormatter - Date/Time: 2013/05/16 11:20:21 INFO 11:20:21,980 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:20:21,980 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:20:22,072 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf to be VCF INFO 11:20:22,090 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF INFO 11:20:22,115 ArgumentTypeDescriptor - Dynamically determined type of /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf to be VCF INFO 11:20:23,555 GenomeAnalysisEngine - Strictness is SILENT INFO 11:20:23,845 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 11:20:23,852 SAMDataSourceSAMReaders - Initializing SAMRecords in serial INFO 11:20:23,899 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04 INFO 11:20:23,947 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/dbsnp_137.hg19.vcf INFO 11:20:24,363 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/Mills_and_1000G_gold_standard.indels.hg19.vcf INFO 11:20:24,653 RMDTrackBuilder - Loading Tribble index from disk for file /pbtech_mounts/homesA/asboner/asboner_scratch/hg19/prostate_samples/resources/1000G_phase1.indels.hg19.vcf INFO 11:20:26,702 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files INFO 11:20:26,721 GenomeAnalysisEngine - Done creating shard strategy INFO 11:20:26,722 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 11:20:26,723 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 11:20:27,050 BaseRecalibrator - The covariates being used here: INFO 11:20:27,051 BaseRecalibrator - ReadGroupCovariate INFO 11:20:27,052 BaseRecalibrator - QualityScoreCovariate INFO 11:20:27,052 BaseRecalibrator - ContextCovariate INFO 11:20:27,053 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3 INFO 11:20:27,054 BaseRecalibrator - CycleCovariate INFO 11:20:27,064 ReadShardBalancer$1 - Loading BAM index data for next contig INFO 11:20:27,068 ReadShardBalancer$1 - Done loading BAM index data for next contig INFO 11:20:56,840 ProgressMeter - Starting 0.00e+00 30.0 s 49.8 w 100.0% 30.0 s 0.0 s INFO 11:21:26,842 ProgressMeter - Starting 0.00e+00 60.0 s 99.4 w 100.0% 60.0 s 0.0 s INFO 11:21:56,844 ProgressMeter - Starting 0.00e+00 90.0 s 149.0 w 100.0% 90.0 s 0.0 s INFO 11:22:26,846 ProgressMeter - Starting 0.00e+00 120.0 s 198.6 w 100.0% 120.0 s 0.0 s INFO 11:22:49,994 ReadShardBalancer$1 - Loading BAM index data for next contig INFO 11:22:49,997 BaseRecalibrator - Calculating quantized quality scores... INFO 11:22:50,101 BaseRecalibrator - Writing recalibration report... INFO 11:22:50,151 BaseRecalibrator - ...done! INFO 11:22:50,151 BaseRecalibrator - Processed: 0 reads INFO 11:22:50,153 ProgressMeter - done 0.00e+00 2.4 m 237.2 w 100.0% 2.4 m 0.0 s INFO 11:22:50,154 ProgressMeter - Total runtime 143.43 secs, 2.39 min, 0.04 hours INFO 11:22:51,209 GATKRunReport - Uploaded run statistics report to AWS S3

Information in output file recal_data.grp:

:GATKTable:Arguments:Recalibration argument collection values used in this run

Argument Value binary_tag_name null covariate ReadGroupCovariate,QualityScoreCovariate,ContextCovariate,CycleCovariate default_platform null deletions_default_quality 45 force_platform null indels_context_size 3 insertions_default_quality 45 low_quality_tail 2 maximum_cycle_value 500 mismatches_context_size 2 mismatches_default_quality -1 no_standard_covs false plot_pdf_file null quantizing_levels 16 recalibration_report null run_without_dbsnp false solid_nocall_strategy THROW_EXCEPTION solid_recal_mode SET_Q_ZERO

:GATKTable:Quantized:Quality quantization map

QualityScore Count QuantizedScore 0 0 93 1 0 93 2 0 93 3 0 93 4 0 93 5 0 93 6 0 93 7 0 93 8 0 93 9 0 93 10 0 93 11 0 93 12 0 93 13 0 93 14 0 93 15 0 93 16 0 93 17 0 93 18 0 93 19 0 93 20 0 93 21 0 93 22 0 93 23 0 93 24 0 93 25 0 93 26 0 93 27 0 93 28 0 93 29 0 93 30 0 93 31 0 93 32 0 93 33 0 93 34 0 93 35 0 93 36 0 93 37 0 93 38 0 93 39 0 93 40 0 93 41 0 93 42 0 93 43 0 93 44 0 93 45 0 93 46 0 93 47 0 93 48 0 93 49 0 93 50 0 93 51 0 93 52 0 93 53 0 93 54 0 93 55 0 93 56 0 93 57 0 93 58 0 93 59 0 93 60 0 93 61 0 93 62 0 93 63 0 93 64 0 93 65 0 93 66 0 93 67 0 93 68 0 93 69 0 93 70 0 93 71 0 93 72 0 93 73 0 93 74 0 93 75 0 93 76 0 93 77 0 93 78 0 93 79 0 79 80 0 80 81 0 81 82 0 82 83 0 83 84 0 84 85 0 85 86 0 86 87 0 87 88 0 88 89 0 89 90 0 90 91 0 91 92 0 92 93 0 93

:GATKTable:RecalTable0:

ReadGroup EventType EmpiricalQuality EstimatedQReported Observations Errors

:GATKTable:RecalTable1:

ReadGroup QualityScore EventType EmpiricalQuality Observations Errors

:GATKTable:RecalTable2:

ReadGroup QualityScore CovariateValue CovariateName EventType EmpiricalQuality Observations Errors

Thanks, Sinan

this is very strange. How big is your BAM file? Can you share it for us to debug this ?

• Posts: 17Member

Sure I can share the bam file. Question is, how would I do that? I have used filezilla to download the bundle pack you have. Is there a specific folder I should put in there and how would you like it name to distinguish it?

Thanks, Sinan

edited May 2013

You can upload it to our FTP server. Instructions are here. Just let me know when you have done so and we will start debugging it internally.

Thank you very much.

Post edited by Carneiro on
• Posts: 17Member

Sorry the bam file is about 2.1G

If you can reproduce the error with a tiny version of your BAM file (which you can create with PrintReads using -L ) then you can just attach your file to this thread, which is optimal.

• Posts: 17Member

I am sorry I have not gotten to the printreads step yet when you say use -L is there an input for that argument? if you give me an example so I can attach the file

Thanks, Sinan

nevermind, just upload the whole file. 2.1G is fairly small.

• Posts: 17Member

Ok, it seems to be taking forever for the uploading it has been saying "uploading" for the past 4 hours. Is there another way to get this to you.

Thanks, Sinan

If you have any place to put it, we can download it from our end. But the FTP is the preferred method.

• Posts: 17Member

I created a folder under my name "Sinan" and I uploaded on the FTP for uploads. There you will see 1024_D_realigned.bam, this bam file has already successfully gone through the RealTargetCreator and IndelRealigner. I do hope to hear some good news because I tired running other bam files which were unsuccessful 0 reads processed again.

Thanks, Sinan

Thanks we will take a look.

• Posts: 17Member

Hello, I was wondering if there was any update or if a solution has been found to my problem.

Thanks, Sinan

It seems like your BAM file has MQ 255 reads, that's why they're all being filtered out.

Yes, the newest GATK will print a more informative message on this problem. It will also be possible to fix by adding -rf ReassignMappingQuality to the command line. Note this will only work in the nightly build and will come out with GATK 2.6

-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

• Posts: 17Member

Should this have been fixed when I specify -fixMisencodedQuals while doing the IndelRealigner? I checked the output of the new bam to the old bam and I could see the adjustments had been made

Thanks, Sinan

Hi Sinan,

-fixMisencodedQuals is meant to fix a different issue which concerns base qualities, not mapping qualities (see release highlights for 2.3 for more details). Are you still having problems?

Geraldine Van der Auwera, PhD

• Posts: 17Member

Hello,

I thought I would bring this to your attention regarding the MQ255. I use Star to run my alignments and just as tophat, star has the same MQ annotation.

255 = uniquely mapped 3 = maps to 2 locations 2 = maps to 3 locations 1 = maps to 4-9 locations 0 = 10 or more locations

So as you can see there is no score actually being assigned for MQ but bwa does give an actually scoring. I was wondering if there is conversion for all 5 scores other then 255 being converted to 60. So that I can proper processes my data through GATK tools

Thanks, Sinan

Hi Sinan,

The GATK will only consider uniquely mapped reads, so converting the MQ 255 values is the only step necessary. The other reads will be ignored.

Geraldine Van der Auwera, PhD

• Posts: 17Member

Ok, thank you very much for all the help I have finally got the BaseRecalibration step to work cheers!

Sinan