If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

QC plot examples

Will_GilksWill_Gilks University of Sussex, UKMember ✭✭


I've attached six plots from some QC analysis of my NGS data, using predominantly GATK. The colours should correspond to the same sample in each plot. The bottom-left plot shows 'error rates' per PCR cycle. Is it worth trying to correct for this in variant recalibration? (Sometimes it feels like trying to correct one black-box with another black-box). In the CallableLoci tool (see plot C), how are the different states determined - specifically for 'Low coverage' is it e.g. <10X or ..



Issue · Github
by Sheila

Issue Number
Last Updated
Closed By

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Will,

    Sorry for the late response. I am going to have Geraldine have a look and get back to you.


  • Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

    The two big spikes in the error-rate-by-cycle plot, lower-left (E): Variants in reads generated during these cycles are more likely to be spurious, right? So the genotypes should be filtered out - or the sample discarded. There's a few other samples which show shorter but more frequent spikes in quality-by-cycle.

    In the top-right plot (B), how about the slightly wavy distribution of read lengths for the top line?

    I found the answer to my other query simply on

    Thanks for the feedback.

  • SheilaSheila Broad InstituteMember, Broadie admin


    I am assuming you have followed the Best Practices recommendations for pre-processing. For your first question regarding the PCR errors, those should be taken care of by BQSR. BQSR specifically corrects for cycle errors.

    For your second question of read lengths, did you trim your reads? The only issue with trimming your reads is that BQSR will not be able to see the trends in base quality scores in those trimmed sections, so the model it builds will not be as accurate.


  • Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

    Hi @Sheila

    Thanks for your response. In fact I haven't performed BQSR yet. It seems like a good thing to do.

    I trimmed reads (prior to mapping with BWA-mem and then Stampy) with:

    fastq-mcf \
    ~/adaptor_sequences/adaptors_truseq_130214.fasta \
    RGnb_fwd.fastq RGnb_rev.fastq -o RGnb_fwd_cleaned.fastq -o RGnb_rev_cleaned.fastq

    Which typically returns:

    Scale used: 2.2
    Phred: 33
    Threshold used: 751 out of 300000
    Files: 2
    Total reads: 45722841
    Too short after clip: 1011114
    Trimmed 11803470 reads (RGnb_fwd.fastq) by an average of 11.32 bases on quality < 7
    Trimmed 9084003 reads (RGnb_rev.fastq) by an average of 33.79 bases on quality < 7

    Could you advise ?


    William Gilks

  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi William Gilks,

    You are fine to proceed on with the rest of your pipeline.

    If you are concerned about the base qualities distribution, you can always try plotting them to see how they look. FastQC is a good tool as well to analyze your base qualities.

    Good luck.


Sign In or Register to comment.