Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

BaseRecalibrator was able to recalibrate 0 reads

I'm trying to track down why `BaseRecalibrator` isn't able to recalibrate my reads, but am having trouble finding the problem's source. I'm running GATK v 3.5.

Command:
```
strings=(
S1233686
)


for i in "${strings[@]}"; do
echo "${i}"

java -jar /ast/emb/anaconda3/opt/gatk-3.5/GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R /ast/emb/prtj3/refs/Homo_sapiens_assembly38.fasta \
-I /ast/emb/prtj3/indel_realigned/${i}.bam \
-knownSites /ast/emb/prtj3/refs/Mills_and_1000G_gold_standard.indels.hg38.vcf \
-knownSites /ast/emb/prtj3/refs/1000G_phase1.snps.high_confidence.hg38.vcf \
-o /ast/emb/prtj3/BQSR/${i}.recal.data.table.txt

done
```



Output:
```INFO 14:42:03,137 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:42:03,139 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 14:42:03,139 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:42:03,139 HelpFormatter - For support and documentation go to removed link
INFO 14:42:03,142 HelpFormatter - Program Args: -T BaseRecalibrator -R /ast/emb/well/prjt3/Homo_sapiens_assembly38.fasta -I /ast/emb/prjt3/indel_realigned/S1233686.bam
-knownSites /ast/emb/prjt3/refs/Mills_and_1000G_gold_standard.indels.hg38.vcf -knownSites /ast/emb/prjt3/refs/1000G_phase1.snps.high_confidence.hg38.vcf -o /ast/emb/prjt3/BQSR/S1233686.recal.data.table.txt
INFO 14:42:03,147 HelpFormatter - Executing as [email protected] on Linux 2.6.32-696.el6.centos.plus.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14.
INFO 14:42:03,147 HelpFormatter - Date/Time: 2019/04/10 14:42:03
INFO 14:42:03,147 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:42:03,147 HelpFormatter - --------------------------------------------------------------------------------
INFO 14:42:03,466 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:42:03,650 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 14:42:03,654 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 14:42:03,773 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12
INFO 14:42:04,552 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 14:42:04,556 GenomeAnalysisEngine - Done preparing for traversal
INFO 14:42:04,557 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 14:42:04,557 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 14:42:04,557 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
INFO 14:42:04,574 BaseRecalibrator - The covariates being used here:
INFO 14:42:04,575 BaseRecalibrator - ReadGroupCovariate
INFO 14:42:04,575 BaseRecalibrator - QualityScoreCovariate
INFO 14:42:04,575 BaseRecalibrator - ContextCovariate
INFO 14:42:04,575 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 14:42:04,575 BaseRecalibrator - CycleCovariate
INFO 14:42:04,650 ReadShardBalancer$1 - Loading BAM index data
INFO 14:42:04,653 ReadShardBalancer$1 - Done loading BAM index data
INFO 14:42:04,663 BaseRecalibrator - Calculating quantized quality scores...
INFO 14:42:04,674 BaseRecalibrator - Writing recalibration report...
INFO 14:42:04,692 BaseRecalibrator - ...done!
INFO 14:42:04,692 BaseRecalibrator - BaseRecalibrator was able to recalibrate 0 reads
INFO 14:42:04,694 ProgressMeter - done 0.0 0.0 s 37.9 h 100.0% 0.0 s 0.0 s
INFO 14:42:04,694 ProgressMeter - Total runtime 0.14 secs, 0.00 min, 0.00 hours
INFO 14:42:05,344 GATKRunReport - Uploaded run statistics report to AWS S3
```


From the Resource Bundle, I retrieved
`Homo_sapiens_assembly38.fasta`,
`Mills_and_1000G_gold_standard.indels.hg38.vcf`, and
`1000G_phase1.snps.high_confidence.hg38.vcf`.



Here I tried validating the input .bam:

Command:
```
java -jar /ast/emb/anaconda3/share/picard-2.18.16-0/picard.jar ValidateSamFile \
I=/ast/emb/prjt3/indel_realigned/S1233686.bam \
MODE=SUMMARY
```

Output:
```
INFO 2019-04-10 15:12:23 ValidateSamFile

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** removed link
**********
********** The command line looks like this in the new syntax:
**********
********** ValidateSamFile -I /ast/emb/prjt3/indel_realigned/S1233686.bam -MODE SUMMARY
**********


15:12:24.009 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/ast/emb/anaconda3/share/picard-2.18.16-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Apr 10 15:12:24 EDT 2019] ValidateSamFile INPUT=/ast/emb/prjt3/indel_realigned/S1233686.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Apr 10 15:12:24 EDT 2019] Executing as [email protected] on Linux 2.6.32-696.el6.centos.plus.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.16-SNAPSHOT
WARNING 2019-04-10 15:12:24 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
No errors found
[Wed Apr 10 15:12:24 EDT 2019] picard.sam.ValidateSamFile done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=506462208
```


Prior to running `BaseRecalibrator`, here's what I did with aligned reads:
1. `MarkDuplicates`
2. `SortSam`, `SORT_ORDER=coordinate`
3. `AddOrReplaceReadGroups`
4. `index`
5. GATK
6. `RealignerTargetCreator`
7. `IndelRealigner`
8. `index`

Sorry, the back ticks don't seem to produce code even after using return.

Comments

Sign In or Register to comment.