We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
somatic CNV PoN

Dear all,
When I run the CNV somatic paired workflow, should I exclude the paired normal sample in PoN?
Previously @shlee said I need to do the germline CNV extraction manually after I get the segment results from both tumor and normal. https://gatkforums.broadinstitute.org/gatk/discussion/24229/call-paired-somatic-cnv#latest
The problem here is when I include my normal sample in the PoN, the final CNV calls for the normal sample seem to have nothing. So I don't have anything to remove from the tumor. I think the program treated most of the points in my normal sample as noise so it removed them, but not all points. I can still see some sparse points on the normal sample plot. It seems to help me remove most of the germline CNVs.
1. Do you recommend to remove the paired normal sample when creating PoN.
2. If I include the paired sample in my PoN, do you think it is good enough to remove germline CNVs?
Segment plot when I include this normal sample in PoN:
The paired tumor sample:
Best Answers
-
slee ✭✭✭
Hi @lzhan140,
I think you meant to tag me, instead of @shlee (who no longer works at the Broad).
You can use a PoN to denoise a normal sample that was included in it, but you need to be careful not to use too many eigensamples---otherwise you will get an overdenoised result as you show in your plot. This happens when the eigensamples/principal components used for denoising are heavily influenced by that particular sample.
How many samples are included in your PoN and how many eigensamples are you using to denoise? In general, for clean WGS data, I'd expect that you'd need to use no more than a few eigensamples to achieve a good denoising result. For a reasonably sized PoN, I'd expect that you should be able to identify larger germline events that may also appear in your tumor sample, at the very least.
So to answer your 2 questions: if you have enough independent samples to build a PoN (tens or more), I'd probably not include the matched normal so we can more easily avoid overdenoising. Whether or not this will allow you to identify germline CNVs depends heavily on your data quality and the quality of your PoN.
Again, see my previous posts and consider also adjusting segmentation parameters to reduce the number of small germline events that may appear in your tumor sample.
-
slee ✭✭✭
Hi @lzhan140,
When you don't observe an elbow in the scree plot, it typically means that your data is relatively isotropic or spherical in data space (i.e., standardized-coverage space). This means you should be able to get a good result with only per-bin-median normalization, perhaps subtracting just a few PCs if desired. Subtracting a number of PCs equal to the number of samples in your PoN is almost certainly overkill in this case.
I would suggest first trying this with your original PoN, and then perhaps next trying with a PoN that does not include the matched normal if the result doesn't look comparable to the tumor result.
Again, I would also recommend adjusting the segmentation parameters if appropriate.
If known/common germline CNVs are a concern, it may be a good strategy to just blacklist these from the outset (you can use -L/-XL when defining bins with PreprocessIntervals, for example).
It's difficult for me to advise further without knowing the goals of your analysis and the particulars of your data, but hopefully this will point you in the right direction!
-
slee ✭✭✭
@lzhan140 I would use the same number of eigensamples to denoise all samples. Using more eigensamples to denoise the tumor might indeed explain the discrepancy in chr19. Your data looks relatively clean, so you might even want to try using zero eigensamples (i.e., only using per-bin medians to normalize).
Answers
Hi @shlee,
could you give me some recommendations?
Thank you!
Hi @lzhan140,
I think you meant to tag me, instead of @shlee (who no longer works at the Broad).
You can use a PoN to denoise a normal sample that was included in it, but you need to be careful not to use too many eigensamples---otherwise you will get an overdenoised result as you show in your plot. This happens when the eigensamples/principal components used for denoising are heavily influenced by that particular sample.
How many samples are included in your PoN and how many eigensamples are you using to denoise? In general, for clean WGS data, I'd expect that you'd need to use no more than a few eigensamples to achieve a good denoising result. For a reasonably sized PoN, I'd expect that you should be able to identify larger germline events that may also appear in your tumor sample, at the very least.
So to answer your 2 questions: if you have enough independent samples to build a PoN (tens or more), I'd probably not include the matched normal so we can more easily avoid overdenoising. Whether or not this will allow you to identify germline CNVs depends heavily on your data quality and the quality of your PoN.
Again, see my previous posts and consider also adjusting segmentation parameters to reduce the number of small germline events that may appear in your tumor sample.
Hi @slee,
Yeah I meant you, sorry.
I used 13 samples for my PoN and 3 of them are replicates, so 10 different individuals. I was running on Terra and I didn't change the eigensamples string (#use all eigensamples in panel by default). The PoN creation was also default (20 eigansamples).
Would you recommend me to use a small number?
Thanks.
Hi @slee,
I extracted all the singluar values for my PoN
252.6883192546395

241.98341466462037
240.27440331099712
239.10785645601126
236.95645436422564
233.37090291352203
228.02019462415257
222.19674853753116
218.28387774821473
215.90399032366327
212.1380520021574
206.93633068946113
165.7532712844405
It seems it's decreasing all the way. I don't see an "elbow" like you would expect in this:

So what eigan sample size should I use?
Hi @lzhan140,
When you don't observe an elbow in the scree plot, it typically means that your data is relatively isotropic or spherical in data space (i.e., standardized-coverage space). This means you should be able to get a good result with only per-bin-median normalization, perhaps subtracting just a few PCs if desired. Subtracting a number of PCs equal to the number of samples in your PoN is almost certainly overkill in this case.
I would suggest first trying this with your original PoN, and then perhaps next trying with a PoN that does not include the matched normal if the result doesn't look comparable to the tumor result.
Again, I would also recommend adjusting the segmentation parameters if appropriate.
If known/common germline CNVs are a concern, it may be a good strategy to just blacklist these from the outset (you can use -L/-XL when defining bins with PreprocessIntervals, for example).
It's difficult for me to advise further without knowing the goals of your analysis and the particulars of your data, but hopefully this will point you in the right direction!
Hi @slee,
Thanks for the recommendations. We just want to make some confident somatic CNV calls. By "confident", it means that we want to be able to tell which ones are somatic and which ones are in the germline. Current PoN, like you said, over-denoised the normal sample, so we are not able to tell which CNVs are germline or somatic.
I will first try with different PCs.
Hi @slee,
Here are some results. I tried to denoise one of my normal samples with eigan sample from all, 10, 7 ,4 to 1. Here is what I found: It looks like only when eigan sample = 1 gives me a similar level of sensitivity as the paired tumor. Notice the deletion on chr7. It constantly appears in my paired normal and only when eigan_sample=1 can detect it.
Do you think I should just use eigan_sample=1 for normals samples and all for tumor samples?
Another question: I noticed a deletion was called on chr19 no matter the eigan sample = 1, 4 or 10 for the normal sample, but not in the compared tumor sample. However, I do see a small peak at that position in paired tumors. Is it because it is over denoised in tumor (eigan=13 for tumor)? Then, I guess I'd better use eigan=1 for all samples?
two of paired tumor:

Thanks!
@lzhan140 I would use the same number of eigensamples to denoise all samples. Using more eigensamples to denoise the tumor might indeed explain the discrepancy in chr19. Your data looks relatively clean, so you might even want to try using zero eigensamples (i.e., only using per-bin medians to normalize).
Hi @slee,
Thanks for your answers. They are super helpful.
I tried with my noisest tumor sample with eigan= 13 and 0. I found a minium difference, but to increase sensitivity and match with the normal samples, I will still use 0 for all.
Top: 0; Bottom: 13
