Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

CalculateTargetCoverage for WGS

ekofmanekofman Member, Broadie

Hi, is it appropriate to use CalculateTargetCoverage for WGS? I'm confused by how the description "Calculates read-counts across targets for the exome copy number variant (CNV) calling workflow" would apply when WGS doesn't involve target capture regions. Thanks for the advice.

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @ekofman
    Hi,

    I think that tool is only for use with exome data. It is true it is not applicable for genome data.

    -Sheila

  • sleeslee Member, Broadie, Dev ✭✭✭

    @ekofman,

    You should not use CNV workflows from 4.beta.6 or previous releases if you wish to run on WGS; those workflows are more suited for WES. The CNV workflow has been significantly reworked in the current release to be scalable and performant on WGS data, and many of the tools and methods have been completely replaced. For example, the tool CalculateTargetCoverage has been replaced by CollectReadCounts.

    Thanks,
    Samuel

  • sutturkasutturka Member

    I am confused by the above comment. I am currently using GATK/4.0.4.0 and I am interested in canine CNV calling with WGS data. Should I use the CNV workflows described in Tutorial#11682 and Tutorial#11683 for WGS data? Should I upgrade to current version 4.0.5.1 to run these workflows with WGS data?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @sutturka
    Hi,

    Yes, you should use those tutorials. Ideally, you will use the latest version, but if you have already started with a different version of GATK4, you should continue with that.

    -Sheila

  • sleeslee Member, Broadie, Dev ✭✭✭

    Apologies for the confusion, @sutturka. To clarify, the tool (CalculateTargetCoverage) that @ekofman was originally asking about was used to collect coverage for WES CNV analyses in versions of the pipeline prior to and including 4.beta.6. These older versions of the pipeline had separate coverage-collection tools for WES (CalculateTargetCoverage) and WGS (SparkGenomeReadCounts).

    If you are using 4.0.4.0, these tools have already been replaced with a single tool (CollectReadCounts), which is suitable for both WES/WGS data, and you should be able to follow the tutorials you linked.

  • sutturkasutturka Member

    Thank you for the clarification.

  • ekofmanekofman Member, Broadie

    @slee Will the new pipelines using CollectReadCounts output tangent normalized files? My ultimate goal is to generate .acs.seg files, which seems to be generated by the AllelicCNV workflow. However, the AllelicCNV workflow requires a tangent normalized file input.

  • ekofmanekofman Member, Broadie
    edited July 2018

    @slee Or is there perhaps a new version of the AllelicCNV tool that can take the outputs from my somatic cnv job (which includes the following tasks:

    CNVSomaticPairWorkflow.ModelSegmentsNormal
    CNVSomaticPairWorkflow.PreprocessIntervals
    CNVSomaticPairWorkflow.PlotModeledSegmentsNormal
    CNVSomaticPairWorkflow.ModelSegmentsTumor
    CNVSomaticPairWorkflow.CollectCountsNormal
    CNVSomaticPairWorkflow.DenoiseReadCountsTumor
    CNVSomaticPairWorkflow.CollectCountsTumor
    CNVSomaticPairWorkflow.PlotDenoisedCopyRatiosNormal
    CNVSomaticPairWorkflow.PlotDenoisedCopyRatiosTumor
    CNVSomaticPairWorkflow.DenoiseReadCountsNormal
    CNVSomaticPairWorkflow.CollectAllelicCountsTumor
    CNVSomaticPairWorkflow.PlotModeledSegmentsTumor
    CNVSomaticPairWorkflow.CallCopyRatioSegmentsNormal
    CNVSomaticPairWorkflow.CollectAllelicCountsNormal
    CNVSomaticPairWorkflow.CallCopyRatioSegmentsTumor

    and doesn't even need the tangent normalized files? Maybe that's outdated now in the new GATK as well? the job you mentioned, CollectReadCounts, is used by the tasks listed above CollectCountsTumor and CollectCountsNormal.

    Currently the documentation says:
    --tangentNormalized: Input file for tumor-sample tangent-normalized target log_2 coverages (.tn.tsv output of GATK CNV tool). I can't figure out where this .tn.tsv output is...

  • sutturkasutturka Member

    Hi Samual,

    I am able to run the somatic CNV calling using the tutorial and I have the final '*called.seg' file as well as plots. I would like to run GISTIC tool on this data. Do you have any suggestions/scripts for conversion or utilization of this data in GISTIC? I should specify that my data is from canine (dog) genome.

    Thanks
    Sutturka

  • xiongssgxiongssg Member

    Hi @slee
    I have used ModelSegments to do segment for WES data. How can I conversion the output seg file to ABSOLUTE V1.2 inputs. Can u give your email to me?

    Thanks,
    Xiong

  • ekofmanekofman Member, Broadie

    @xiongssg I would be curious to learn whether the conversion worked for you and you were able to successfully run ABSOLUTE

  • xiongssgxiongssg Member

    @ekofman I have test the internal transfrom script on one tumor sample, and it worked better than AllelicRecapseg. And now, I'm working it on other more samples. But the segments generated for some samples have too much noise, and so I'll try to fine-tune ModelSegments smoothing parameters to dampen the noise .

  • sleeslee Member, Broadie, Dev ✭✭✭

    @xiongssg great to hear that the script worked for your sample. I would recommend experimenting with your coverage-collection bin size or increasing number-of-changepoints-penalty-factor in ModelSegments to smooth out your noisy samples. As I communicated elsewhere, the script itself might stand some tweaking as well, but we are waiting to hear feedback from others at the Broad who are running ABSOLUTE. @ekofman if you have results to share, I'd be glad to meet and discuss them!

Sign In or Register to comment.