Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Do I need to remove multi-mapping reads from bam files upon performing germline CNV discovery?

Hello, I need to perform a germline CNV calling analysis on a cohort of 30 WES (10 trios) with GATK 4.1.2.0. I wanted to know whether it is recommended to remove multi-mapping reads from the bam files I use for the analysis. I wonder if they could increment the rate of false positives.
For now I used as input for the CNV analysis the same bam files I used for the GATK HaplotypeCaller (obtained with: alignment with BWA, sorting, mark duplicates); would that be ok?

Thanks in advance,

Giovanni

Best Answers

  • SkyWarriorSkyWarrior Turkey ✭✭✭
    edited August 7 Accepted Answer

    As far as I remember there is a filter for mapping quality for readcounts as well as there is an additional step for filtering where depth shows massive variability among the samples in the overall flow. I am using GATK GermlineCNV regularly and I have yet to see any issues without removing MAPQ0 reads from the bam file as they are already getting filtered.

    There is also an additional interval annotation option where you can score regions of high and low mappability based on single k-mer mapping scores from elsewhere (indicated in the tutorial). It will also help the workflow to distinguish areas of error and confidence.

  • asmirnovasmirnov Broad
    Accepted Answer

    @gspirito what @SkyWarrior said is completely correct. We filter reads with mapping quality 0 by default in the CollectReadCounts step of the pipeline. So every read with ambiguous mapping will be filtered out.

    The resources for additional annotation based filtering are located in this google cloud bucket: gs://https:/gatk-best-practices/germline-copy-number/

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited August 7 Accepted Answer

    As far as I remember there is a filter for mapping quality for readcounts as well as there is an additional step for filtering where depth shows massive variability among the samples in the overall flow. I am using GATK GermlineCNV regularly and I have yet to see any issues without removing MAPQ0 reads from the bam file as they are already getting filtered.

    There is also an additional interval annotation option where you can score regions of high and low mappability based on single k-mer mapping scores from elsewhere (indicated in the tutorial). It will also help the workflow to distinguish areas of error and confidence.

  • asmirnovasmirnov BroadMember, Broadie, Dev
    Accepted Answer

    @gspirito what @SkyWarrior said is completely correct. We filter reads with mapping quality 0 by default in the CollectReadCounts step of the pipeline. So every read with ambiguous mapping will be filtered out.

    The resources for additional annotation based filtering are located in this google cloud bucket: gs://https:/gatk-best-practices/germline-copy-number/

  • gspiritogspirito Member
Sign In or Register to comment.