Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Meaning of --min_base_quality_score

tytolintytolin Member
edited October 2018 in Ask the GATK team

What is the --min_base_quality_score mean ?
Is it base on the mapping quality in the sam/bam file or the sequencing base quality?

I'm a little bit confused by the description of the tool document in UnifiedGenotyper or HaplotypeCaller.

Comments

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @tytolin,

    --min-base-quality-score is defined as:

    Minimum base quality required to consider a base for calling Default value: 10.

    Each sequenced base in a read comes with an associated base quality score that estimates the likelihood of that base being called correctly.

  • tytolintytolin Member

    Hello, @shlee

    So, gatk gets base quality score according to the bam file mapping quality score ?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @tytolin,

    So, gatk gets base quality score according to the bam file mapping quality score ?

    The base quality scores are distinct from alignment mapping quality score. The latter is assigned by the aligner.

  • tytolintytolin Member
    edited October 2018

    Hello, @shlee

    Oh, "the letter assigned by aligner" means the QUAL column in the sam file instead of MAPQ column. Right?
    And the --mbq filter those base pair whose base quality lower than the threshold out, right?

    I want to set a threshold of mapping quality like bcftools mpileup and then use SV concordance to see the consistent short variants in UG and bcftools mpileup. How can I do ?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @tytolin,

    Now you've lost me. Can you rephrase your question please?

  • tytolintytolin Member

    @shlee
    Ok, thanks for your help.

    Mapping quality is a score for a single read. And base quality is used to value each base in a read.

    I want to use SelectVariants to extract concordant short variants from vcf files generated by HaplotypeCaller and bcftools mpileup then using the concordant vcf as a reference to do recalibration. For getting a dependable reference vcf, should I set the same criteria in two programs before using SelectVariants?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @tytolin,

    I want to use SelectVariants to extract concordant short variants from vcf files generated by HaplotypeCaller and bcftools mpileup then using the concordant vcf as a reference to do recalibration. For getting a dependable reference vcf, should I set the same criteria in two programs before using SelectVariants?

    By recalibration, are you referring to base quality score recalibration (BQSR) or variant quality score recalibration (VQSR)?

    As far as I know, what you propose, to look for concordant calls in results from a reassembly caller (HaplotypeCaller) and a pipeup caller (bcftools mpileup), is not feasible just with SelectVariants. This is because these callers will produce different representations of what are essentially the same variant when pinned back to reads. Towards this type of concordance matching, we recommend you use an external tool, RTG-Tools, and their vcfeval module. This module will output the records that it determines are TP, FP etc, in the record's original state. So then these could be taken as "truth" and applied back to the reads/variants you wish to recalibrate. Here you would match the "truth" to the same state your data-to-be-recalibrated is in. Hopefully, I've understood your question correctly and provided something helpful.

  • tytolintytolin Member

    @shlee

    Thanks for your help.
    I do want to do BQSR to correct the base score in bam files.
    I'll try RTG-Tools out.

    Tyto

Sign In or Register to comment.