We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Do I need to remove multi-mapping reads from bam files upon performing germline CNV discovery?

Hello, I need to perform a germline CNV calling analysis on a cohort of 30 WES (10 trios) with GATK 4.1.2.0. I wanted to know whether it is recommended to remove multi-mapping reads from the bam files I use for the analysis. I wonder if they could increment the rate of false positives.
For now I used as input for the CNV analysis the same bam files I used for the GATK HaplotypeCaller (obtained with: alignment with BWA, sorting, mark duplicates); would that be ok?

Thanks in advance,

Giovanni

Best Answers

  • SkyWarriorSkyWarrior Turkey ✭✭✭
    edited August 2019 Accepted Answer

    As far as I remember there is a filter for mapping quality for readcounts as well as there is an additional step for filtering where depth shows massive variability among the samples in the overall flow. I am using GATK GermlineCNV regularly and I have yet to see any issues without removing MAPQ0 reads from the bam file as they are already getting filtered.

    There is also an additional interval annotation option where you can score regions of high and low mappability based on single k-mer mapping scores from elsewhere (indicated in the tutorial). It will also help the workflow to distinguish areas of error and confidence.

  • asmirnovasmirnov Broad ✭✭
    Accepted Answer

    @gspirito what @SkyWarrior said is completely correct. We filter reads with mapping quality 0 by default in the CollectReadCounts step of the pipeline. So every read with ambiguous mapping will be filtered out.

    The resources for additional annotation based filtering are located in this google cloud bucket: gs://https:/gatk-best-practices/germline-copy-number/

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited August 2019 Accepted Answer

    As far as I remember there is a filter for mapping quality for readcounts as well as there is an additional step for filtering where depth shows massive variability among the samples in the overall flow. I am using GATK GermlineCNV regularly and I have yet to see any issues without removing MAPQ0 reads from the bam file as they are already getting filtered.

    There is also an additional interval annotation option where you can score regions of high and low mappability based on single k-mer mapping scores from elsewhere (indicated in the tutorial). It will also help the workflow to distinguish areas of error and confidence.

  • asmirnovasmirnov BroadMember, Broadie, Dev ✭✭
    Accepted Answer

    @gspirito what @SkyWarrior said is completely correct. We filter reads with mapping quality 0 by default in the CollectReadCounts step of the pipeline. So every read with ambiguous mapping will be filtered out.

    The resources for additional annotation based filtering are located in this google cloud bucket: gs://https:/gatk-best-practices/germline-copy-number/

Sign In or Register to comment.