GATK4.alpha emitDroppedReads option

Does GATK4.alpha support -emitDroppedReads option that works with -bamout?

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, it looks like we haven't ported that functionality yet, sorry.

  • Thanks. Is there any plan to have it back?
    We actually need IndelRealignment for our pipeline, but it's dropped from GATK4.a. To benchmark other Indel realigner, I am trying to obtain all realigned reads that HaplotypeCaller use via -bamout option, and it seems needs --emitDroppedReads.

  • rajitzrajitz TorontoMember

    Hi, it would really be helpful to have the --emitDroppedReads flag in GATK HaploTypeCaller version 4. Is this expected to be incorporated any time soon? Thanks.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @rajitz

    This is on our radar, but it has not been completed yet. You can keep track of its progress here.
    Also posting in that github thread might help move this process along.

    Regards
    Bhanu

  • rajitzrajitz TorontoMember

    Thanks for the prompt response. Is there any other way meanwhile to see why certain reads are getting dropped? The documentation for v 3.8 at https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--emitDroppedReads says that it could be due to "filtering, trimming, realignment failure". Could you please elucidate the cutoffs used for these three criteria?

    On another note, I'm trying to call the variants again with --bam-writer-type set to ALL_POSSIBLE_HAPLOTYPES because for the default CALLED_HAPLOTYPES, the documentation says that this "Writes out the reads aligned only to the called haplotypes". Could this perhaps explain why reads are getting dropped? If so, would the three criteria above still be in play?

    Thanks again.

  • rajitzrajitz TorontoMember

    Actually with --emitDroppedReads we could be looking at all down-sampled and uninformative reads also in addition to filtered, trimmed and those having realignment failure. Would be great to know more about all these criteria in this context.

    Thanks & Regards.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited December 4

    Hi @rajitz

    "filtering, trimming, realignment failure". Could you please elucidate the cutoffs used for these three criteria?

    The cutoffs would be based on the quality of data is not a one size fit all number.

    Could this perhaps explain why reads are getting dropped?

    ALL_POSSIBLE_HAPLOTYPES is used by "developers" to see all the haplotypes and reads possible and created by algorithm. This is different from dropped reads for "filtering, trimming, realignment failure". For users we recommend

    Would the three criteria above still be in play?

    No, -bam-writer-type is used to determine Which haplotypes should be written to the BAM and --emitDroppedReads" determines whether dropped reads will be tracked and emitted when -bamout is specified. They are used in two different output types.

    Actually with --emitDroppedReads we could be looking at all down-sampled and uninformative reads also in addition to filtered, trimmed and those having realignment failure. Would be great to know more about all these criteria in this context.

    This is probably a good suggestion and as i mentioned, about feature requests, posting in the github thread will be helpful.

    Hope this helps.

    Regards
    Bhanu

  • rajitzrajitz TorontoMember

    Hi Bhanu,

    Thanks for the helpful clarifications. For the criteria, even though we don't have hard cutoffs, could you please provide a one-line summary for each one of them in this context? For example, I've heard about "trimming" in terms of adapter trimming, but am not sure how it applies here.

    • Filtered reads
    • Trimmed reads
    • Reads with realignment failure
    • Down-sampled reads
    • Uninformative reads

    Cheers :)

    p.s. GitHub is blocked at my end so couldn't post there.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited December 5

    Hi @rajitz

    Our github is public so you should be able to access it.
    For more information on context, please refer to the READ FILTERS section here.

    Read Filters: https://software.broadinstitute.org/gatk/documentation/article?id=11007
    Down sampling: https://gatkforums.broadinstitute.org/gatk/discussion/1323/downsampling
    Uninformative reads: https://software.broadinstitute.org/gatk/documentation/article.php?id=6005

    Regards
    Bhanu

Sign In or Register to comment.