If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
beware of using binned quality scores with some GATK procedures
Hi all -
Noticed a GATK problem with the new bin quality score option produced by Illumina HiSeq Control Software worth sharing with the forum. In order to reduce output file size (by up to 37%), a new option to “Bin QScores” was added and set as the default in the latest upgrade of the Illumina software for version 4 fluidics. The binning does appear to work as expected (I verified the fastq and initial bam files following alignment only contained binned Q scores of 6, 15, 22, 27, 33, 37, and 40 as well as I think unexpectedly 2 and 14 (ASCII values for 33+QScore of ' 0 7 < B F I as well as # / )), however, the output files from GATK IndelRealigner on these binned QScores cause BaseRecalibrator to crash in about 30% of our exome runs. The error is that it finds quality scores above the expected level (and as it assumes that is because the wrong encoding was used, crashes on the spot, indicating that: "we encountered an extremely high quality score of 63" ). I am just guessing that that might be related to the variance of the QScores of surrounding bases being too small, possibly zero.
Currently am re-running these through GATK 3.1-1, (the crashes occurred with 2.3-0) and hopefully they will be alright. But for those wishing to utilize previous GATK versions for comparable sample runs, this is something to watch out for.
The current default in running Illumina's HiSeq Control Software is for this binning of QScores to be turned on. It can easily be deselected prior to a run, in the substep: “Run Configuration” > “Storage” > “Bin QScores”.
Since one can’t go backwards, but can always bin to reduce size in the future, might be good for users with any uncertainty to turn this compression off.