Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Depth of Coverage doesn't factor Y chromosome genomic repetitive sequences?

shawardenshawarden Dunedin, New ZealandMember
edited March 2016 in Ask the GATK team

In our pipeline we try to do relatively quick molecular gender validation using Depth of Coverage on chromosome 1, X and Y and compare the coverage of each, report any anomalies or move on.

For exomes this works fine once we've worked out the bias for a given capture platform.

For genomes though, when running Depth Of Coverage (3.5-0-g36282e4) we get some wildly exaggerated numbers for Y coverage. For example in a male Chr1 is ~40, X is ~20 and Y is ~40!

We think this is due to repetitive sequences on the Y chromosome being mis-handled as additional coverage. The exome .bed files limit the regions looked at so these repeats aren't seen. The genomic .bed file is just the whole Y.

It's just so consistently close to a 1:1 ratio of Chr1 to Y coverage for all our male samples.

QualiMap (v2.1 build 2015-03-19 12:05) on the other-hand does not have this issue. We get the expected coverages of ~40, ~20 and ~18 respectively.

If it is simply a repetition issue, would there be a way to correct this output?

Could it be something different since the Chr1:Y ratio is roughly 1 for every male sample we run.

Is there better way to validate molecular gender with equal or less processing time?

Thanks.

Script snippet:

trait GATK_Arguments extends CommandLineGATK {
    this.reference_sequence = referenceFile
    this.nCoresRequest      = numThreads
    this.isIntermediate     = true
}

trait Finger_Arguments extends CommandLineGATK {
    this.isr = org.broadinstitute.gatk.utils.interval.IntervalSetRule.INTERSECTION
}

trait DoC_Arguments extends DepthOfCoverage {
    this.omitDepthOutputAtEachBase  = true
    this.omitLocusTable             = true
    this.omitIntervals              = true
}

... 

val depthOfACoverage = new DepthOfCoverage with GATK_Arguments with Finger_Arguments with DoC_Arguments
val depthOfXCoverage = new DepthOfCoverage with GATK_Arguments with Finger_Arguments with DoC_Arguments
val depthOfYCoverage = new DepthOfCoverage with GATK_Arguments with Finger_Arguments with DoC_Arguments

depthOfACoverage.input_file = Seq(printReads.out)
depthOfXCoverage.input_file = depthOfACoverage.input_file
depthOfYCoverage.input_file = depthOfACoverage.input_file

depthOfACoverage.intervals = Seq(capPlatforms(platform)) // .bed file for exome chip
depthOfXCoverage.intervals = depthOfACoverage.intervals
depthOfYCoverage.intervals = depthOfACoverage.intervals

depthOfACoverage.intervalsString = Seq("1")
depthOfXCoverage.intervalsString = Seq(trueXRegion) // "X:2699521-154931043"
depthOfYCoverage.intervalsString = Seq(trueYRegion) // "Y:2649521-59034050"

val DoCHead = indiPath + "/" + gvDepth

depthOfACoverage.out = DoCHead + "-A"
depthOfXCoverage.out = DoCHead + "-X"
depthOfYCoverage.out = DoCHead + "-Y"

depthOfACoverage.analysisName = individual
depthOfXCoverage.analysisName = individual
depthOfYCoverage.analysisName = individual

depthOfACoverage.jobName = individual + "_IGV_DoAC"
depthOfXCoverage.jobName = individual + "_IGV_DoXC"
depthOfYCoverage.jobName = individual + "_IGV_DoYC"

add(
  depthOfXCoverage,
  depthOfYCoverage,
  depthOfACoverage
)

/home/ClinGen/Resources/Capture_Platforms/GRCh37/Genomic.bed:

1   1   249250621   chromosome 1    0   .
...
X   1   155270560   chromosome X    0   .
Y   1   59373566    chromosome Y    0   .

Depth of Coverage for Autosomal chromosome 1 command line:

'java'  '-Xmx8192m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/home/shawarden/ClinGen/Dev/ID/.queue/tmp'  '-cp' '/home/ClinGen/bin/queue-3.5-0-g36282e4/Queue.jar'  'org.broadinstitute.gatk.engine.CommandLineGATK'  '-T' 'DepthOfCoverage'  '-I' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/ID.PrintReads.bam'  '-L' '/home/ClinGen/Resources/Capture_Platforms/GRCh37/Genomic.bed'  '-L' '1'  '-isr' 'INTERSECTION'  '-R' '/home/ClinGen/Resources/broad_bundle_b37_v2.5/human_g1k_v37.fasta'  '-o' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/gvDepthOfCoverage-A'  '-omitLocusTable'  '-omitIntervals'  '-omitBaseOutput'  

Depth of Coverage for Autosomal chromosome 1 sample_summary output:

sample_id   total       mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_15
ID          9212399088  40.89   45                      40              36                      99.5
Total       9212399088  40.89   N/A                     N/A             N/A

Depth of Coverage for X chromosome command line:

'java'  '-Xmx8192m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/home/shawarden/ClinGen/Dev/ID/.queue/tmp'  '-cp' '/home/ClinGen/bin/queue-3.5-0-g36282e4/Queue.jar'  'org.broadinstitute.gatk.engine.CommandLineGATK'  '-T' 'DepthOfCoverage'  '-I' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/ID.PrintReads.bam'  '-L' '/home/ClinGen/Resources/Capture_Platforms/GRCh37/Genomic.bed'  '-L' 'X:2699521-154931043'  '-isr' 'INTERSECTION'  '-R' '/home/ClinGen/Resources/broad_bundle_b37_v2.5/human_g1k_v37.fasta'  '-o' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/gvDepthOfCoverage-X'  '-omitLocusTable'  '-omitIntervals'  '-omitBaseOutput'  

Depth of Coverage for X chromosome sample_summary output:

sample_id   total       mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_15
ID          3050588508  20.55   24                      21              18                      87.2
Total       3050588508  20.55   N/A                     N/A             N/A

Depth of Coverage for Y chromosome command line:

'java'  '-Xmx8192m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/home/shawarden/ClinGen/Dev/ID/.queue/tmp'  '-cp' '/home/ClinGen/bin/queue-3.5-0-g36282e4/Queue.jar'  'org.broadinstitute.gatk.engine.CommandLineGATK'  '-T' 'DepthOfCoverage'  '-I' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/ID.PrintReads.bam'  '-L' '/home/ClinGen/Resources/Capture_Platforms/GRCh37/Genomic.bed'  '-L' 'Y:2649521-59034050'  '-isr' 'INTERSECTION'  '-R' '/home/ClinGen/Resources/broad_bundle_b37_v2.5/human_g1k_v37.fasta'  '-o' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/gvDepthOfCoverage-Y'  '-omitLocusTable'  '-omitIntervals'  '-omitBaseOutput' 

Depth of Coverage for Y chromosome sample_summary output:

sample_id   total       mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_15
ID          950208612   41.34   25                      22              18                      90.1
Total       950208612   41.34   N/A                     N/A             N/A

QualiMap Command Line:

'/home/ClinGen/bin/qualimap_v2.1/qualimap'  'bamqc'  '-bam' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/ID.PrintReads.bam'  '-outfile' 'mlQualiMap_genomic.pdf'  '-c'  '--java-mem-size=24G'  '-outformat' 'pdf'  '-outdir' '/mnt/ClinGen/TSD/GRCh37/Active/Genomic/ID/mlQualiMap_genomic'  '-gd' 'HUMAN'  '-nr 500'  '-nt' '16'

QualiMap output:

2.7. Chromosome stats
Name    Length      Mapped bases    Mean coverage   Standard deviation
1       249250621   10494149541     42.1            1,328,790.04
...
X       155270560   3488511556      22.47           3,979
Y       59373566    1085839604      18.29           184,271.52

Issue · Github
by Sheila

Issue Number
644
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • shawardenshawarden Dunedin, New ZealandMember

    Hi Geraldine,

    That's a good point regarding using an existing exome platform both bother coverage and speed.

    Using AV5 for example works perfectly at 2:1 for both Chr1:X and Chr1:Y.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad to hear it works :)

Sign In or Register to comment.