UnifiedGenotyper read filters

There are several read filters used by UnifiedGenotyper before variant calling. In the UnifiedGenotyper, it says the followings:

UnmappedReadFilter
NotPrimaryAlignmentFilter
FailsVendorQualityCheckFilter
BadMateFilter
MalformedReadFilter
DuplicateReadFilter
MappingQualityUnavailableFilter

Is it possible to tell which exactly parameters in the BAM the UnifiedGenotyper looks for for these filters?

thanks

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited August 2014

    They follow the SAM specification, so basically they read the bit flags in the reads as explained here: http://picard.sourceforge.net/explain-flags.html

  • ying_sheng_1ying_sheng_1 Member ✭✭
    edited August 2014

    Thanks for quick answer!

    I have three questions:

    • what is the differences between:
      NotPrimaryAlignmentFilter
      DuplicateReadFilter

    Both are explained as "filter out duplicate reads", however, I think the "NotPrimaryAlignmentFilter" means: if asking the aligner to output both optimal alignment and sub-optimal alignment, UG will only use the optimal alignment. Do I understand right?

    • Which one is used for filter out reads which are mapped to multiple locations equally (mapping ambiguous reads)? I guess is MappingQualityUnavailableFilter. The explanation of this is "filter out mapping quality zero reads". If an aligner doesn't use this rule to assign mapping quality, what will happen?

    • Is there a tool which can print out reads only used by UG for variant calling? e.g. apply the same filters like UG?

    Ying

    Post edited by ying_sheng_1 on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, you are correct about the NotPrimaryAlignment filter, there is an error in the documentation -- I will make sure it gets fixed.

    If an aligner doesn't follow the standard reporting rules it would be a problem, but all the major ones do as far as I know.

    You can use PrintReads with the same filters that UG applies added to your command line.

  • ying_sheng_1ying_sheng_1 Member ✭✭

    Thanks a lot! It is really early now in the States...

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Indeed, it's 5 am here! My cat woke me up by stepping on my face... So I answer forum questions while trying to fall back to sleep.

  • ying_sheng_1ying_sheng_1 Member ✭✭

    hope you got a good sleep afterwards yesterday ;-)

    Just a following on questions:

    • We use novoalign as aligner here. I talked to the author of novoalign, they don't use mapping quality 0 for ambiguous mapping reads. Instead, they give true score. So the highest mapping quality for reads mapping to multiple locations is 3 (a read mapped to two locations equally). Do you think this will give some problem?

    • If we want to mimic UG filters, e.g. set a minimum mapping quality as 4 when doing variant calling with UG, do I use right to just add the following options to the UG command:

    --read_filter MappingQuality --min_mapping_quality_score 4

    thanks,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, I did, thank you :)

    You can adjust the min_mapping_quality_score to anything that makes sense for your data, but FYI the default is already 10, so that will take care of the multimappers from Novoalign.

    http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_engine_filters_MappingQualityFilter.html

  • ying_sheng_1ying_sheng_1 Member ✭✭
    edited September 2014

    Hi Geraldine,

    I didn't see the "as default, min_mapping_quality_score is 10".

    My command without manually setting min_mapping_quality_score is:
    java -Xmx4g -jar GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar
    -R gatkBundle_2.5/human_g1k_v37_decoy.fasta
    -T UnifiedGenotyper
    -I all.realigned.markDup.baseQreCali.bam
    --dbsnp gatkBundle_2.5/dbsnp_137.b37.vcf
    -o gatk/all.raw.vcf
    -stand_call_conf 50
    -stand_emit_conf 10
    -L wex_Agilent_SureSelect_v05_b37/wex_Agilent_SureSelect_v05_b37.baits.slop50.merged.list
    -nt 3
    -glm BOTH
    -dcov 200

    See the attached file with the header in vcf file.

    I tried to set "the min_mapping_quality_score as 4" and the depth (DP) of a site which is covered by many low mapping quality reads has been changed (from 38 to 3):

    Before manually setting min_mapping_quality_score:
    6 32009651 rs199953230 C T 74.28 VQSRTrancheSNP99.00to99.90 AC=2;AF=1.00;AN=2;BaseQRankSum=-5.131;DB;DP=38;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=8.91;MQ0=0;MQRankSum=0.862;QD=1.95;ReadPosRankSum=-0.512;VQSLOD=-1.290e-02;culprit=QD;set=FilteredInAll GT:AD:DP:GQ:PL 1/1:20,18:38:9:102,9,0

    After:
    6 32009651 rs199953230 C T 74.28 PASS AC=2;AF=1.00;AN=2;DB;DP=3;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=30.00;MQ0=0;QD=24.76;VQSLOD=4.07;culprit=FS;set=variant GT:AD:DP:GQ:PL 1/1:0,3:3:9:102,9,0

    So I wonder whether there is a min_mapping_quality_score cutoff in the default settings of UnifiedGenotyper? Or am I missing something?

    thanks,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @ying_sheng_1 UG does not have an internal cutoff for mapping qual score, it uses the default engine value which is 10 as indicated in the document I linked to above. Are you sure you did't mix up the files or the commands?

    By the way, you should upgrade to using the HaplotypeCaller in version 3.3, it will give you better results than the old UG.

Sign In or Register to comment.