Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Format of gatk4cnv_padded_target_bed_capture in "GATK 4 CNV Proportional Coverage for Capture"

Sahar90Sahar90 Broad InstituteMember, Broadie

Hi,
I am trying to run "GATK_Somatic_CNV_Toolchain_Capture" and absolute on targeted sequences with no matched normal.
I understand that the first step is to run "GATK 4 CNV Proportional Coverage for Capture" with the interval list used.
What should the format of "target_bed" be? Is there an example?
Thanks

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited January 2017

    Hi @Sahar90,

    For the Version:0288cff-SNAPSHOT gatk4.jar, we see

    --targets,-T:File             File containing the targets for analysis  Default value: null. 
    

    I see this description does not differentiate between the 0- or 1-based systems that differentiate BED and intervals list formats, respectively, so my apologies for that. I'm going to have to double-check this but this makes me think that the tool will accept either format and distinguishes between them. Picard's BedToIntervalList documentation briefly describes the two formats and GATK's Article#1319 describes them in depth. Looks like the GATK engine recognizes the .bed extension and interprets the coordinate system appropriately at least for the -L parameter. I've just confirmed this is also the case for -T for the Version:0288cff-SNAPSHOT gatk4.jar.

    For an example, you can refer to the latest CNV tutorial from the October 2016 Vancouver workshop (links are at the very top of the original CNV tutorial). The tutorial worksheet describes the target file (-T):

    The target file (-T) is a padded intervals list of the baited regions. You can add padding to a target list using the PadTargets tool. For us, padding each exome target 250bp on either side increases sensitivity. The -targetInfo FULL option keeps the original target names from the target list. The –keepdups option asks the tool to include alignments flagged as duplicate.

    I hope this is helpful.

    Post edited by shlee on
  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @Sahar90 Look in the workspace Algorithms_Commons. The best-practice workflow is in there. Copy it into your workspace (using Sync)...

Sign In or Register to comment.