Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

CNV standardize and denoise plots disagree each other

Hi GATK Team,
I'm following the CNV workflow: Sensitively detect copy ratio alterations and allelic segments.

I ran step 1-4 with my own WGS 30x samples and a 7 sample PON to test against my tumor samples. All bam files were preprocessed by FireCloud workflow. I'm using hg38. No error messages at the end. my GATK is 4.0.8.1

At the end of step 4, I have a few questions:

  • for one patient, the standardize and denoise plots disagree with each other not only one cell clone but different ones, but only this patient.

I can't post markdown links to my images, please see attachment.

1. Standardized plot shows increase at X but denoised plot shows decrease. How did this happen? Is this related to my sample itself, the PON components(7) or calling algorithm?

2. Like the two sample plots above, some samples give me a smooth line of CNV ratio and some other samples have more noise. How did the difference occur and how can I improve it?

Thank you!
Le

Comments

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lzhan140,

    We should expect the MAD score to decrease after denoising. Any chance you accidentally swapped the tsvs for --standardized-copy-ratios and --denoised-copy-ratios?

    Also, for your data at X, are you using a PoN and sample that matches in sex or are the samples mixed? ModelSegments CNV does not account for allosomal chromosomes and you must work around this in your experimental design.

  • lzhan140lzhan140 Member

    Hi @shlee,

    Thanks for your responds.

    I wrote two steps in one scripts, so I didn't manually add them, they shouldn't be swaped.
    This is output code for DenoiseReadCounts:

    --standardized-copy-ratios ${sample_base}_standardized.tsv \ 
    --denoised-copy-ratios ${sample_base}_denoised.tsv 
    

    This is input for PlotDenoisedCopyRatios:

    --standardized-copy-ratios ${sample_base}_standardized.tsv \
    --denoised-copy-ratios ${sample_base}_denoised.tsv
    

    Yes, my PoN is mixed with both male samples and females. I don't have too many control samples, so I mixed. Should I generate PoN separately? This PoN is patient specific. How does it perform if I use public database to generate non-patient-specific PoN?

    So do you mean I need to find a different tool to call allosomal chromosomes CNVs?

    Le

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lzhan140,

    What do you mean when you say the PoN is patient-specific? Do you mean from the same batch of sample prep and sequencing runs?

    You must remove the allosomal chromosomes from your ModelSegments analysis of autosomes. If you have only a few control samples, then the suggested strategy from our developers is to generate a PoN for the autosomal contigs from all the samples and then separate PoNs for the allosomal chrX and chrY from the appropriate samples. You can confirm samples are the expected sex using DetermineGermlineContigPloidy. Note there are aneuploid states for allosomal contigs that still make for viable humans without any phenotypes.

    Thanks for sharing your plots. From them, assuming the two case samples are normal for allosomal copy numbers, it appears both your samples are female and your PoN has more male samples than female samples. What is interesting to note in your plots is that denoising expands the bandwidth of chr1-chr22 (increases MAD) but compacts that of chrX (decreases chrX-MAD). The copy ratio for the standardized plot seems reasonable at ~1.8, but the PCA denoising has done something peculiar for chrX copy-ratios--the ratios are halved! This is because the algorithm assumes all of the data presented by the PoN across the contigs is normal.

    See how the algorithm performs when you remove the allosomal contigs.

  • lzhan140lzhan140 Member

    Hi @shlee,

    Yes, they are from the same sequencing batch and they are tumor/normal pairs.

    I will update the performance when I have new results.

  • lzhan140lzhan140 Member

    Hi @shlee,

    I'm back with new results.

    So I created PoN together for autosomes and one PoN of allosomes for male and one PoN of allosomes for female. Then after I got the autosome tsvs and allsome tsvs, I combined them together to plot.

    Here are the results:
    1. Right female PoN did give right ratio and wrong sex PoN messed up X.
    2. Right male PoN still gave me ratio around 1 on X, Y and wrong sex PoN just has no value on Y but still 1 on X.

    Shouldn't I have a value around 0 for X and Y for male?

  • lzhan140lzhan140 Member

    Just realized, like you said, the program assumes what my inputs on PoN are the normal. If my input PoN has only 1 copy on allosomes, then any test sample which is normal on allosome should also return 1. This makes sense.

    However, if this is the case, there is another qurstion: why didn't it give me a decrease if I tested male sample with a female PoN? The normal copy for female PoN on X is 2 but there is only 1 here.

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    Hi there @lzhan140 , we are looking into an answer for your question.

    @shlee, would you please follow up on this line of questioning when you get a chance?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lzhan140,

    I'm glad to see your new approach gives expected allosomal results. I see that your MADs are still rather similar for the standardized and denoised data. Given you say:

    own WGS 30x samples and a 7 sample PON

    I'm guessing your denoising will be better either (i) by increasing the number of samples in your PoN and/or (ii) by performing multidimensional segmentation of the copy ratios alongside allelic ratios as outlined in https://software.broadinstitute.org/gatk/documentation/article?id=11683.

    why didn't it give me a decrease if I tested male sample with a female PoN? The normal copy for female PoN on X is 2 but there is only 1 here.

    It seems there is an improvement in results for chrX for the male sample with the updated approach. I think this will become more apparent after segmentation. There appear to be smears at the edges of chrX. The start of chrX especially has a pronounced smear that disappears for your corrected approach. However, this is not what you are asking about. You are asking about why your male's chrX CRs appear at CR = 1 with the mixed-sex PoN. My guess is that it relates to my earlier observation:

    it appears both your samples are female and your PoN has more male samples than female samples

    Say for example only one of your PoN samples is female and the rest are male. The way PoN creation happens, there are filtering steps across the bins and across the samples to remove outlier data. In this case, the singular female sample's data points for X would probably be removed. Does this sound like a scenario that might apply to your case?

  • lzhan140lzhan140 Member

    Hi @shlee,

    Thanks for the answers.

    You are asking about why your male's chrX CRs appear at CR = 1 with the mixed-sex PoN.

    I didn't use a mixed-sex PoN. I have two PoN for allosomes, a 5-sample male allosome PoN, and a 2-sample female allosome PoN. No male samples participated in the female PoN creation, so it shouldn't be removed.

    it appears both your samples are female and your PoN has more male samples than female samples.

    The first post with figures I did selected 2 females, but the second post with figures I deliberately selected one male and one female. The trick point here is when I test male case sample with female PoN(which is the wrong gender that I shouldn't use), it still gives me CR centered at 1, and by using the wrong PoN, I expect it to be decreased. Do you know why this happens?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Thanks for the clarification @lzhan140. GATK4 Somatic CNV workflows are tuned to work with U.S. TCGA data (The Cancer Genome Atlas), where normal sample depths are by design at much lower coverage than tumor sample depths. The workflow is designed to analyze a tumor sample against a PoN made of normals of different total average depth. Is your male case x female PoN analysis across the whole genome or did you perform the analysis against each chromosome in separate instances?

  • lzhan140lzhan140 Member

    Hi @shlee,

    male case x female PoN was only performed on read counts from allosomes (X, Y).

    normal sample depths are by design at much lower coverage than tumor sample depths

    I think this is the reason.

    The copy just goes from 2 to 1 for male case x female PoN . To translate in read depths, for my 30x case, probably just go from 30 to 15 in theory, but I have already known my average coverage is about 26 not even 30x. So in the end, I think there is not a lot difference in read depth. Plus the known issue on X,Y pesudo regions alignment. If the program is designed to detect high tumor depths against low normal, it should explain why I still had 1 CR in my test. I guess it just not designed for X,Y deletion detection, but it worked perfectly on duplication detection. My microarray data of the same DNA samples confirmed all duplictions detected in this workflow.

    I will update you with segment results.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lzhan140,

    Based on what you say:

    I guess it just not designed for X,Y deletion detection, but it worked perfectly on duplication detection.

    I just remembered there is an imputation step in PoN creation that I think will interest you. Check out the --do-impute-zeros:Boolean parameter of CreateReadCountPanelOfNormals, which the tool doc describes as:

    If true, impute zero-coverage values as the median of the non-zero values in the
    corresponding interval.  (This is applied after all filters.)  Default value: true. Possible values: {true, false} 
    

    Also, I believe the workflow uses relative coverage across the given data to ascertain large aneuplid events against the PoN. So the more data you can give the denoising step at once, the more empowered the analysis will be.

    Once you provide an update, I'll ping our developer for further insight.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited December 13

    HI @lzhan140

    We haven't heard from the user in more than two business days. The user has been notified and this ticket is now closed. Please get back to us if you have more questions.

    Regards
    Bhanu

Sign In or Register to comment.