We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Call Paired Somatic CNV

Dear All,
I'm planning to use Terra to run the paired somatic CNV workflow, but I'm not sure how this workflow calls somatic CNVs.
1. The wdl script calls CNVs for tumor, denoise tumor with PON, and plot segments;
2. Calls CNVs for the normal and, denoise tumor with PON, and plot segments;
Then, the workflow ends. I'm wondering how the workflow get rid of normal CNVs? Do I need to extract CNVs existing in normal from the somatic result manually?
It is very clear in Mutect2 that when running the command, I can also input -I normal
for the program to extract normal mutations. Could you help me clarify the step where GATK remove normal CNVs?
In Mutect2
gatk --java-options "-Xmx${command_mem}m" Mutect2 \ -R ${ref_fasta} \ $tumor_command_line \ $normal_command_line \ ${"--germline-resource " + gnomad} \ ${"-pon " + pon} ...
In DenoiseReadCounts
gatk --java-options "-Xmx${command_mem_mb}m" DenoiseReadCounts \ --input ${read_counts} \ --count-panel-of-normals ${read_count_pon} \ ${"--number-of-eigensamples " + number_of_eigensamples} \ --standardized-copy-ratios ${entity_id}.standardizedCR.tsv \ --denoised-copy-ratios ${entity_id}.denoisedCR.tsv
Thank you!
Best Answers
-
slee ✭✭✭
Hi @lzhan140,
You are right that the workflow does not yet have a step to remove germline CNVs detected in the matched normal. You might be interested in the unsupported WDLs at https://github.com/broadinstitute/gatk/tree/master/scripts/unsupported/combine_tracks_postprocessing_cnv (which use a rough heuristic to filter germline CNVs), or you might want to implement your own filtering step.
To give a bit of context, the current somatic CNV workflow was originally based on workflows developed by Broad CGA. These older workflows used the CBS segmentation algorithm, which was often not sensitive enough to pick out shorter germline CNVs in somatic samples, and so no filtering step was typically performed. Unfortunately, this algorithm was also overly sensitive to uncorrected noise and bias at longer length scales.
The segmentation algorithm used by the current workflow is more sensitive to real events at shorter length scales and is less susceptible to noise at longer length scales. Thus, you could alternatively experiment with simply tuning the ModelSegments segmentation parameters to be less sensitive to shorter events or to more aggressively smooth them away, if this will fulfill your purposes.
-
slee ✭✭✭
@lzhan140 you should not mix WES and WGS samples, as this is not likely to yield good PCA denoising results. Samples used for the PoN should ideally be representative of the same sequencing protocol, as should your case samples; i.e., all samples should exhibit similar systematic sequencing biases.
On the other hand, it might be OK to use 60x samples to denoise 30x samples, as long as otherwise similar sequencing protocols were used to generate them all. This is because the overall depth is normalized out during PoN creation and denoising. (Your 30x samples will have a higher level of statistical noise relative to your 60x samples, but I think you should still be able to get a good result as long as your
bin-length
is not too small. However, you should be cognizant of theminimum-total-allele-count
parameter that controls hard filtering of the allelic counts at common SNP sites in ModelSegments; this is set to 30 by default, so you might need to lower it for your 30x samples.)As always, the proof is in the pudding, and it's impossible to say whether your samples satisfy all of the assumptions implicit in PCA denoising without simply running the analysis. You should check the scree plot of the eigenvalues found in your PoN and inspect the plots and metrics from PlotDenoisedCopyRatios to make sure your denoising results look reasonable. You might want to refer to the somatic CNV tutorials or search for related posts in the forum if you need more pointers.
Answers
Hi @bhanuGandham
Thanks for your answer, we have already shared the space before. Our workspace is pd-wgs-project/pd-wgs-workspace/.
There is no actual workflow for somatic CNV in our workspace now. We are just checking the demo workflow scripts at help-gatk/Somatic-CNVs-GATK4.
Hi @lzhan140
The tool creates a panel of normals that forms the baseline for what is the norm against which the workflow compares case samples. Take a look at this notebook tutorial on Terra that walks you through how CNV works in a very user friendly way
https://app.terra.bio/#workspaces/help-gatk/GATKTutorials-Somatic-July2019/notebooks/launch/2-somatic-cna-tutorial.ipynb
Hi @lzhan140,
You are right that the workflow does not yet have a step to remove germline CNVs detected in the matched normal. You might be interested in the unsupported WDLs at https://github.com/broadinstitute/gatk/tree/master/scripts/unsupported/combine_tracks_postprocessing_cnv (which use a rough heuristic to filter germline CNVs), or you might want to implement your own filtering step.
To give a bit of context, the current somatic CNV workflow was originally based on workflows developed by Broad CGA. These older workflows used the CBS segmentation algorithm, which was often not sensitive enough to pick out shorter germline CNVs in somatic samples, and so no filtering step was typically performed. Unfortunately, this algorithm was also overly sensitive to uncorrected noise and bias at longer length scales.
The segmentation algorithm used by the current workflow is more sensitive to real events at shorter length scales and is less susceptible to noise at longer length scales. Thus, you could alternatively experiment with simply tuning the ModelSegments segmentation parameters to be less sensitive to shorter events or to more aggressively smooth them away, if this will fulfill your purposes.
Hi @slee,
Thanks a lot for the answer. It solved my question. I have two additional questions regarding the PON:
1. If I use different resources to create my PON, for example, a mix of depth 30x and 60x or WES and WGS. Will this confuse the program? Should I use just one type, like only 30x WGS or only 60x WES?
2. If my PON is created by using a different depth of samples than my case samples, will the program know it? e.g. My PON is created by some 60x WGS but all my case samples are 30x WGS. Will the program know the difference and give me expected CNV calls or will it treat my case samples to be all deletions? since my case samples only have half the coverage than "Normals".
Thanks!
@lzhan140 you should not mix WES and WGS samples, as this is not likely to yield good PCA denoising results. Samples used for the PoN should ideally be representative of the same sequencing protocol, as should your case samples; i.e., all samples should exhibit similar systematic sequencing biases.
On the other hand, it might be OK to use 60x samples to denoise 30x samples, as long as otherwise similar sequencing protocols were used to generate them all. This is because the overall depth is normalized out during PoN creation and denoising. (Your 30x samples will have a higher level of statistical noise relative to your 60x samples, but I think you should still be able to get a good result as long as your
bin-length
is not too small. However, you should be cognizant of theminimum-total-allele-count
parameter that controls hard filtering of the allelic counts at common SNP sites in ModelSegments; this is set to 30 by default, so you might need to lower it for your 30x samples.)As always, the proof is in the pudding, and it's impossible to say whether your samples satisfy all of the assumptions implicit in PCA denoising without simply running the analysis. You should check the scree plot of the eigenvalues found in your PoN and inspect the plots and metrics from PlotDenoisedCopyRatios to make sure your denoising results look reasonable. You might want to refer to the somatic CNV tutorials or search for related posts in the forum if you need more pointers.
@slee, thank you so much for the answer.