To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

How to reveal why reads are filtered out when using PrintReads?

Hi,

when exploring gatk v4.beta.5-36 with PrintReads -I foo.bam -O bar.bam, all my alignments are filtered away:

15:57:37.096 INFO PrintReads - 15201761 read(s) filtered by: WellformedReadFilter

Is there a way to get a more informative summary here, indicating why my alignments are not well formed according to org.broadinstitute.hellbender.engine.filters.WellformedReadFilter#wellFormedFilter?

I've already digged into the sources of org.broadinstitute.hellbender.engine.filters.CountingReadFilter (which seems to be the filter backend of PrintReads) but it does not seems to keep track of per filter statistics.

Next I'd like to disable the filter that's causing all my alignments to be filtered away, and I've noticed that I could disable individual filters with --disableReadFilter but the docs don't seem to state what values are possible here. Where could I find a list of options without digging through the source code?

I'm rather new to gatk so thank you for your patience.

Best,
Holger

Tagged:

Answers

  • EADGEADG KielMember

    Hi @holgerbrandl,

    you could disable the tool default filter with the command --disableToolDefaultReadFilters but you should ask your self why all of your reads are not wellformed e.g. filtered. With --disableReadFilter you can disable a specific filter. But Print Reads in gatk4.5 only use the WellformedReadFilter...

    Greetings EADG

  • cnormancnorman United StatesMember, Broadie, Dev

    Hi @holgerbrandl,

    The WellformedReadFilter is an aggregate filter that includes:

    ValidAlignmentStartReadFilter
    ValidAlignmentEndReadFilter
    AlignmentAgreesWithHeaderReadFilter
    HasReadGroupReadFilter
    MatchingBasesAndQualsReadFilter
    ReadLengthEqualsCigarLengthReadFilter
    SeqIsStoredReadFilter
    CigarContainsNoNOperator

    Strangely, using the built-in WellformedReadFilter won't give you counts for individual filters, but if you disable it, and then manually enable each of the constituent filters on the command line, you should get counts by filter, and you'll be able to see where the filtering is coming from, ie.:

    ./gatk-launch PrintReads -I in.bam -O out.bam --disableToolDefaultReadFilters --readFilter ValidAlignmentStartReadFilter --readFilter ValidAlignmentEndReadFilter ....and so on

    You can see the beta doc for individual filters that are available by looking at the ReadFilters section of the beta doc.

Sign In or Register to comment.