Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to call SNP only using proper paired reads?

Dear Team,

I want to use GATK to call SNP only using proper paired reads. I found a related tag "MateSameStrandFilter", but I don't know which procedure should I use it? Can you please show me some example command line? Thanks a lot.

Best regards,

Xiaobin

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Xiaobin, you just add --read_filter MateSameStrand to your command line.

  • xxb0316xxb0316 Member

    Dear Geraldine,

    I have tried GATK PrintReads and --read_filter MateSameStrand for a bam file with more than 95% reads properly paired and mapped according to samtools flagstat, but the output bam (less than 10% of original bam size) seems to be quite the opposite, that is, the procedure filtered out properly paired reads and only left some orphan mapped reads or unmapped reads. Have you met this before? Can you please help me with it? Thank you very much.

    Xiaobin

  • grosscogrossco Member

    Check the flag bits of a few of the paired SAM records. They are running afoul of one of the boolean checks in the read filter.

    return (! read.getReadPairedFlag() ) || 
                read.getMateUnmappedFlag() || 
                read.getDuplicateReadFlag() ||
                read.getReadFailsVendorQualityCheckFlag() || 
                read.getMateNegativeStrandFlag() != read.getReadNegativeStrandFlag()
    
    • 0x0010 (16) is the negative strand flag.
    • 0x0200 (512) is fails vendor quality check flag.
    • 0x0400 (1024) is the duplicate read flag.
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @ Xiaobin, FYI the GATK callers (both UG and HC) use several read filters to filter out reads that are not appropriate for calling variants, so you shouldn't need to add this filter anyway -- it is probably too strict for your purposes.

  • seruseru BergenMember ✭✭

    Hi,

    I have just tried the MateSameStrand filter in HaplotypeCaller and had the same issue as @Xiaobin - 99% of read-pairs in a small region of interest were discarded. I tried samtools using exactly the same flags and the effect was opposite. These flags are tricky so I am pasting the samtools commands I used for your reference:

    samtools view -h -f1 -F524 -L $BED $BAM | samtools view -Sc -f32 -F16 - > READfwdMATErev.cnt
    samtools view -h -f1 -F524 -L $BED $BAM | samtools view -Sc -f16 -F32 - > READrevMATEfwd.cnt

    And then I carefully (complex conditional expressions are evil!) analyzed the expression pasted by @grossco. Checking the code in the repo shows that if the expression is evaluated as true the read is filtered out. The last part of it (read.getMateNegativeStrandFlag() != read.getReadNegativeStrandFlag()), if I read it correctly, would be true if the read&mate were on different strands. And this is exactly the case in which we would like to keep the pair. Shouldn't the condition rather be read.getMateNegativeStrandFlag() == read.getReadNegativeStrandFlag()?

    Thanks,
    Paweł

  • seruseru BergenMember ✭✭

    I just cloned 3.3 version and corrected the expression. Here is the command I used

    gatk3 -T HaplotypeCaller -I $BAM -rf MateSameStrand -L 11:111964721-111966721 1>/dev/null

    output before modification:
    INFO 13:17:17,793 MicroScheduler - 577 reads were filtered out during the traversal out of approximately 578 total reads (99.83%)
    INFO 13:17:17,794 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
    INFO 13:17:17,794 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 13:17:17,794 MicroScheduler - -> 5 reads (0.87% of total) failing HCMappingQualityFilter
    INFO 13:17:17,795 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 13:17:17,795 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 13:17:17,796 MicroScheduler - -> 572 reads (98.96% of total) failing MateSameStrandFilter
    INFO 13:17:17,796 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
    INFO 13:17:17,796 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

    and after:
    INFO 13:20:20,827 MicroScheduler - 105 reads were filtered out during the traversal out of approximately 578 total reads (18.17%)
    INFO 13:20:20,828 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
    INFO 13:20:20,828 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 13:20:20,828 MicroScheduler - -> 96 reads (16.61% of total) failing HCMappingQualityFilter
    INFO 13:20:20,829 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 13:20:20,829 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 13:20:20,829 MicroScheduler - -> 9 reads (1.56% of total) failing MateSameStrandFilter
    INFO 13:20:20,830 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
    INFO 13:20:20,830 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

    Only 9 reads compared to 572 (out of 578) are filtered after the change, and this corresponds to the counts reported by samtools:
    $samtools view -h -f1 -F524 -L $bed $bam | samtools view -Sc -f32 -F16 -
    285
    $samtools view -h -f1 -F524 -L $bed $bam | samtools view -Sc -f16 -F32 -
    284

    578 - (285+284) = 9

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @seru Would you mind making a pull request on github for the devs to look at?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks! We'll try to get this processed in the near future, though right now the unrelenting snowstorms are really messing up our work schedule, so it might take a week or two before we can get to it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks @seru, your patch was just approved. It should be available in the next nightly build and will be integrated in the next official version. Thanks for your contribution!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @seru, Your patch was just merged in now -- sorry for the delay. It will be in tomorrow's nightly build.

Sign In or Register to comment.