Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Downsampling in GenotypeGVCFs

MattBMattB NewcastleMember ✭✭

The documentation for GenotypeGVCFs states that it is employing downsampling:

Downsampling settings
This tool applies the following downsampling settings by default.

Mode: BY_SAMPLE
To coverage: 1,000

I also see this in my output :"INFO 14:51:47,705 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000"

Specifically what is it downsampling, the genotype likelihoods per locus or active region? Is this behaviour changeable via -dcov, or would changing it be inadvisable?

Issue · Github
by Sheila

Issue Number
214
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Apologies for the confusion. GenotypeGVCFs doesn't actually do any downsampling; what you're seeing, both in the documentation and in the console output, are artifacts of how the code is structured. In programming terms, GenotypeGVCFs is a subtype of LocusWalker, ie a tool that traverses the genome by position one by one. Because of that, it inherits properties of LocusWalkers, such as the ability to downsample reads, even though it does not use those properties itself.

    So this is not something you need to worry about or do anything about, it is meaningless. I'll see if we can clean this up to avoid confusion in the future, but it may be difficult to allocate effort to doing this since the effect is harmless and we have other more important development priorities.

  • MattBMattB NewcastleMember ✭✭

    Hey thanks for clearing that up, I did have a sneaking suspicion that it was an inherited artefact, but always find sneaking suspicions the hardest to ignore.

Sign In or Register to comment.