Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Somatic copy number variant discovery (CNVs)

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

Purpose

Identify somatic copy number variant (CNVs) in a case sample. Requires an appropriate Panel of Normals (PON).



Reference Implementations

Pipeline Summary Notes Github FireCloud
Somatic CNV case sample Case BAM to CNV universal yes b37
Somatic CNV PON creation Normal BAMs to PON universal yes b37

Documentation for these workflows is in development.

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • dayzcooldayzcool Member

    I would like to try GATK's Somatic CNV on exome and whole genome samples.
    I noticed that there are workflows in the 'placeholder', and gatk source repo. also includes cnv workflows updated at later date.
    Would you advise if those are ready to be used and which one is a better choice? Thank you!

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @dayzcool,

    It's my understanding the gatk-workflows repository (the linked URL for 'placeholder') is meant to illustrate the use of fixed versions of the gatk source repo scripts with filled-out example inputs JSON files. The gatk-workflows/gatk4-somatic-cnvs repo is being worked on as we speak--as you can see it is missing example inputs JSON files.

    Differences between these repo scripts are meant to be minimal. The differences you see now are because this particular script (not to mention the tools) is fairly new and still undergoing further tweaks based on recent tests.

    Currently, for an advanced user as yourself, perhaps the best source of information on the Somatic CNV workflow is the WDL scripts in the Github repository at https://github.com/broadinstitute/gatk/tree/master/scripts. These are updated by the developers concurrently with tool version updates and offer additional supporting WDL scripts as well as unsupported WDL scripts that may be of interest and that are not in a gatk-workflows repo, e.g. a script for Mutect2 panel of normals creation.

    The same somatic CNV WDL script applies to either exome or whole genome data. You need only tweak the --bin-length value to be appropriate for the type of data, e.g. default 1000 for genomes or 0 for exomes. I have just started to prepare to write the Somatic CNV workflow tutorial so this and other supporting documents should become available on the forum ~ in a month or two.

  • dayzcooldayzcool Member

    @shlee, I really appreciate for your kind explanations and letting me know the --bin-length argument.
    Too bad I can't wait for the documentation, but I am looking forward to reading it in couple months!

  • @shlee , I have tried GATK-CNV in beta mode before the GATK 4 launched.

    now I want to redo my analysis, should I still follow the instruction from those links?

    1. your post- (How to) Call somatic copy number variants using GATK4 CNV:

    https://gatkforums.broadinstitute.org/gatk/discussion/9143/how-to-call-somatic-copy-number-variants-
    using-gatk4-cnv

    1. pdf file (Call somatic copy number variants using GATK CNV):

    http://genomicinfo.broadinstitute.org/acton/attachment/13431/f-012e/1/-/-/-/-/Somatic_CNV_handon_worksheet.pdf?sid=TV2:QKwcswckd

    Is there any new command or process should add if I use GATK4?

    my work flow :

    1.padding target_list

    2.prepare proportional coverage

    3.prepare PON

    4.normolize with PON

    5.perform segment

    6.plot segment

    7.call segment

    B.T.W, the final tumor.called from call segment step, does its Segment_Mean column contain the value

    of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2

    CopyNumber does? because I am now working on several tools evaluation experiment, I need to figure

    out this. Really thanks for performing such powerful tools in CNV calling!.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @eric_wu,

    The tools have changed in major ways since GATK4.beta.6, starting with tool names in GATK4.0.0.0. These changes were merged into master for GATK4.0.0.0, released in January. If you are using workflows for any beta release, the tutorials you list apply. If you are using GATK4.0.0.0, then the tutorials are currently in the works. The previous tutorials do still apply conceptually and in the major data transformation steps. It's just that details (tool features, tool names, parameter names, underlying algorithms) have changed, e.g. incorporating matched normal information. You can refer to the WDL scripts and tool documents for now for the GATK4.0.0.0 workflow.

  • [email protected],

    Thanks for reply! then I will use the beta5 CNV workflow first!.

    and my another question :

    the final tumor.called from call segment step, does its Segment_Mean column contain the value

    of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2

    CopyNumber does?

    I check the values, it seems that more like the raw relative ratio than log2 transform cause no negative

    sign in front of the values, Do I have the same interpretation with you? Thank you!.

  • eric_wueric_wu Member
    edited February 8

    sorry, real post is at the bottom one.

  • Hi @shlee ,

    Thanks for reply! then I will use the beta5 CNV workflow first!.

    and my another question :

    the final tumor.called from call segment step, does its Segment_Mean column contain the value

    of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2

    CopyNumber does?

    I check the values, it seems that more like the raw relative ratio than log2 transform cause no negative

    sign in front of the values, Do I have the same interpretation with you? Thank you!.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @eric_wu,

    Let me have @LeeTL1220 jump in here.

  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @eric_wu I believe it is not the log2 transform.

  • Hi @shlee & @LeeTL1220,

    Thanks for reply! yet another pop out when I check the final called file and the .ptn & .tn file.

    I found that the column store the coverage, which usually the last column, use the normalize sample

    name as the sample name in the final tumor called file.

    e.g. my PON name is foo, my five tumor sample name is fooA, fooB, fooC, fooD, fooE

    It turns out that in each tumor.called file, all the tumor samples are named as foo, but not fooA~fooE

    I not sure if it is ok to report this bug here, or maybe I should report in GATK4 git?

    Many Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @eric_wu
    Hi,

    Sorry for the delay. I asked someone from the team to get back to you soon.

    -Sheila

  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @eric_wu If you can, please report in the GATK4 repo. Thanks.

  • HI, @shlee ,
    Can I run GATK4 CNV on a somatic sample WITHOUT matched normal sample ( I have PoN created with some normal samples) ?
    Thank you!

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Absolutely @bulitsky, you can run a tumor sample through somatic CNV without a matched normal.

  • afzmafzm Member

    Is it possible to run those .wdl scripts without Docker, I cannot use it in my server, (I mean is there a workaround?). Thank you very much.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @afzm,

    Yes, you can run WDL scripts without Docker. Just remove the runtime section. See the latest WDL & Cromwell basics hands-on worksheet for example WDLs that do not call on a Docker and also for a brief explanation of the runtime section (section 7).

Sign In or Register to comment.