Fail retreiving variant calling, problem related with MappingQuality ?

mandormandor Posts: 8Member
edited January 2013 in Ask the GATK team

Dear all,

I fail retrieving variant calling (.vcf ) using the GATK2.0, although with a similar example it works well. I compared both and I find a difference in Mapping Quality (the mapping quality of the example that works has 60 whereas the other has 255 -this last one is performed using bfast and gives this quality-)

Googling, I already find that this could be caused because of GATK doesn't take into account qualities of 255. Is it true?
(Note than this solution affects to GATK1.4 and I am using the GATK2.0)

I also checked the reference genome was ok (and also the .bed file with the exom position).

I repeat the process changing manually the "255" to "60" with the same result.

Any ideas of what could be the problem?

The executed command :

java -jar /home/public/biotools/GATK_2.0/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /home/public/mnt/cubix/public/biodata/hg19_norm/hg19.fa -L /home/public/mnt/cubix/public/biodata/hg19_norm/bed/all_captured_exomes_hg19.bed -I /home/public/test/outer_test/intermediate/AUT143_chr15_extract_cpy_sorted.bam -glm SNP -nt 4 -o /home/public/test/outer_test/intermediate/AUT143_chr15_extract_cpy_sorted.bam.vcf

$ samtools view AUT143_chr15_extract_cpy_sorted.bam | head
1629_189_1658_F3 0 15 20002423 255 2I48M * 0 0 TATCCAAATATCCACTTGCAGATTCCACGAAAATAGTGTTTCAAAACTGC 5:=59B98@>!!4<8C?72!!!!92::!!1373!!00!!-00!!.!!!!% XA:i:2 MD:Z:48 XE:Z:-----------3--------300-----1-----2---2----1--0-0- PG:Z:bfast RG:Z:sample IH:i:1 NH:i:1 HI:i:1 CM:i:10 NM:i:2 CQ:Z:!7B90A;/:A@<%>.>?5'.:?7>+/=)9'&22%%%&%(%%1&/%(/%%% OQ:Z:ARZ`SR!!LUU

]E>!!!!RCUO!!6AM@!!44!!3?@!!6!!!!% AS:i:925    CS:Z:T03320100333301120131300020111200032211200211000203
1396_1160_190_F3    16  15  20006225    255 50M *   0   0   CTCAATCTAAAGATAGGTTCAACTCTCTGAGATGAGTGCACACATCACAA  !!!!!!6/206B1!!!!38!!!!!2535:D41426372@A?86!!!!!!!  XA:i:2  MD:Z:26G4T18    XE:Z:0-310--------2-3---2-00--------------------13-2-2- PG:Z:bfast  RG:Z:sample IH:i:1  NH:i:1  HI:i:1  CM:i:13 NM:i:2  CQ:Z:!'''1+5-1:<A5%4.+''%8@+%1))'?%,?%1&%8%<<-.018:-%)- OQ:Z:!!!!!!RJGDR
J!!!!?M!!!!!;C?9TF57;BKBC__TG!!!!!!! AS:i:475 CS:Z:T02121311111131122132221222200022013223220032201320
(I also try to change manually the 255 value in .sam file (and I added 60) with no values ...)


Post edited by Geraldine_VdAuwera on


  • ebanksebanks Broad InstitutePosts: 691Member, Administrator, Broadie, Moderator, Dev admin

    Hi there,

    Yes, as mentioned in the documentation we filter out reads with MQ=255. It looks like you are using an older version of Bfast, so I'd suggest you update to the latest version of Bfast where this bug is fixed.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mandormandor Posts: 8Member

    Thanks for your answer. But well, I have not the original fasta files in this case, only the .BAMs.

    My first attempt to change this manually already failed.

  • mandormandor Posts: 8Member

    Maybe a better question could be like that : "Supposing that the coordinates of the reads in the .bam file are correct, and also I am referring to good genome reference, what could be the other things than make that GATK avoid to use this reads ? (for Variant Calling)"

  • mandormandor Posts: 8Member
    edited August 2012

    Sorry, It is my fault. In one of my tests, I played with a extraction of a .bam file (I catched a wrong interval of manually detected variant callings ... ).

    The problem was related to MapQ, and I fixed the original .bam file using GATK command : "-T PrintReads -rf ReassignMappingQuality -DMQ 60".

    As pointed @ebanks, the reads are filtered with MQ=255. Thanks for the help! :)

    Post edited by mandor on
  • mandormandor Posts: 8Member

    Can myself change this discussion from state "Question" to state "Answered" ? [sorry for the inconvenience]

Sign In or Register to comment.