Can the CNV workflow use for WES data?

Hi Team,
May I ask if the CNV workflow can be used on WES data?

Best Answers


  • Thank you very much for your kind help!
    I want to do the CNV analysis for mouse WES data. May I ask if it is necessary to provide an interval list for the step of "PreprocessIntervals", can I omit the -L argument and set bin-length as 1000? If not, how can I get the interval list file for mouse WES data? @shlee
  • claire1011claire1011 Member
    edited December 2018
    Thank you so much, that is really helpful. Besides, may I ask how to create PanelOfNormals by "CreateReadCountPanelOfNormals". For example, I have 5 tumor samples and 2 control samples, should I use all these 2 control samples to build panel of normal? But then how can I compare my two groups? Like figure 4B and 4D for HCC1143_normal in the tutorial link? @shlee
  • Hi Shlee @shlee , Thank you so much for your kind help! Since I do not have much biological background, I am sorry if my questions are basic. Let me make my case more clearly, I have 5 tumor samples from mouse (MS1 - MS5) and 2 control samples from their two parents (RMS1, RMS2).
    So far, I use RMS1 and RMS2 to create the PoN, and generate the plot by

    1. TUMOR:
    gatk --java-options "-Xmx12g" DenoiseReadCounts \
    -I MS1.hdf5 \
    --count-panel-of-normals normal.pon.hdf5 \
    --standardized-copy-ratios MS1.standardizedCR.tsv \
    --denoised-copy-ratios MS1.denoisedCR.tsv

    2. NORMAL:
    gatk --java-options "-Xmx12g" DenoiseReadCounts \
    -I RMS1.hdf5 \
    --count-panel-of-normals normal.pon.hdf5 \
    --standardized-copy-ratios RMS1.standardizedCR.tsv \
    --denoised-copy-ratios RMS1.denoisedCR.tsv

    I am a little confuse since if I use both RMS1 and RMS2 to create PoN, then can I use this PoN to denoise the RMS1 or RMS2 sample? Can you give me any suggestion if I am doing it right? If not, could you please let me know which is the best way for me to create PoN and generate the copy ratio plot for both tumor and normal sample?

    Thank you!!!
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited December 2018

    Hi @claire1011,

    @bhanuGandham asked that I follow up on your question.

    If as you said earlier you need to confirm your normal samples are indeed normal, then as I've stated previously the best approach is to create a multi-sample PoN. For experimental mouse models, because strains are fairly distinct from each other and pretty much identical within a strain, and because the PoN is meant to capture systematic noise, I think you can use either biological or technical replicates of RMS strain samples towards the PoN.

    If you lack access to additional RMS normals, then the next best thing is to use each normal sample as the PoN for the other, e.g. RMS1 against an RMS2-PoN and RMS2 against an RMS1-PoN. If each normal confirms as normal, then you can pool these together for a 2-sample PoN to use with the tumor samples. Please do not include the sample of interest--the case sample--in the analysis PoN. Hopefully, each normal sample confirms as normal as expected and any differences you observe are either parental germline differences (I suppose unlikely for cloned mice) or part of the noise that we wish to capture. Definitely worth sussing out the normal samples to rule out any sample swaps between normal and tumor samples. Sample swaps happen more often than we like to think.

    If each tumor sample presents CNVs that are distinct from the other tumor samples, then there is a third option where you could use the two normals and four of the five tumor samples (total six samples) within a PoN to analyze the fifth remaining tumor sample. There are a number of filtering steps to remove outlier data in PoN creation and so the PoN would effectively omit such rare events. You will have to double-check PoN creation filtering parameters to make sure they are appropriate for the six samples. If your sample set is amenable to such a round-robin approach, your denoising would be empowered. You should carefully consider each tumor sample for appropriateness in this approach. Towards sussing out coverage extremes in samples, I can recommend FilterIntervals, which is a tool I just happened to study carefully yesterday towards gCNV tutorial writing.

    I hope you get interesting results with your tumor samples!

    Soo Hee

Sign In or Register to comment.