Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

GTAK DepthOfCoverage binning window

Hi there,

I want to run GTAK tool DepthOfCoverage with a window like every 10000 bases for WGS data, how should I set up the parameter?
Can I use --nBins? does it mean nBins= whole genome size / 10000?

Can I run this tool per chromosome to speed up?

Best,
Weini

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @weini_huang
    Hi Weini,

    Are you asking about running DepthOfCoverage on intervals of size 10,000 bases? Like, you want to run on intervals 1-10,000, 10,001-11,000, 11,001-12,000...

    If so, you will need to create an interval list and use --intervals to pass in the list. The --nBins you refer to is not appropriate in your case.

    Yes, you can run the tool per-chromosome to speed things up.

    -Sheila

  • zacharycabinzacharycabin Member
    edited March 7

    @Sheila
    This isn't working for me? I am trying to do (what sounds like) the same thing -- get coverage across windows. I have the interval list set up correctly (I believe) but the output is just giving average coverage across the entire chromosome. Is there some code or something I am missing?

    I have run the exact same code with a different interval list (only 5 ~5kb regions as opposed to windows across a chromosome) formatted the exact same way and the output was what i wanted (coverage at each region independently).

    java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -R $REFFASTA \
    -I sample.rg.sorted.bam -ct 3 -ct 2 -ct 1 \
    --countType COUNT_FRAGMENTS \
    --intervals $genotype/window.intervals \
    -o sample.window.table

    here is the first few lines of my interval list;
    Chr_01:1-100000
    Chr_01:100001-200000
    Chr_01:200001-300000
    Chr_01:300001-400000
    Chr_01:400001-500000
    Chr_01:500001-600000
    Chr_01:600001-700000
    Chr_01:700001-800000
    Chr_01:800001-900000
    ...to the end of the chr.

    Post edited by zacharycabin on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @zacharycabin
    Hi,

    Hmm. I wonder if the tool is merging the intervals into one, since they are abutting each other. Can you try running on Chr_01:1-100000 and Chr_01:100002-200000 to test this out? If this is the issue, you can use --interval_merging OVERLAPPING_ONLY.

    -Sheila

    P.S. There is a feature request for this in GATK4 as well.

Sign In or Register to comment.