Basically, the tool does not count any sample where the genotype is null (./.) towards the sample count. So for your sites, those lacking the annotation will have fewer than ten genotypes. The annotation is requested by default, so the WARN statements will appear even if you do not request it. There is nothing to worry about, as it is just letting you know. There will be no downstream issues (only that the annotation will not be present in your VCF). If you need the annotation, you will need at least 10 samples.
This will help us narrow down the issue.
The google url for these files is: https://console.cloud.google.com/storage/browser/broad-references/hg38/v0/
No downloading is necessary. By running in Terra, the docker image will be pulled to the virtual machine your task is run on. You just need to indicate where it is. Did you clone the example Mutect workspace from the Showcase section? If so, everything is already set up for you and you should just have to list workspace.gatk_docker as the attribute. See here for further explanation on how this all works.
If you have any further Terra questions, can you please post them to the Terra forum or use the other support options I mentioned?
The issue is that the subworkflow is written in v1.0 and the workflow it is used in is an older version(draft-2). The current workaround is to upgrade draft-2 to wdl 1.0. I have informed the dev team about this so they are aware of this issue.
The latest bioconda for GATK4.1.4 installs R packages including "ggplot2". AnalyzeCovariates now works w/o errors.
conda install -c bioconda gatk4
Hi @cmt !
Expect this fix Beri linked to to be in the next release (it is not in 126.96.36.199). With that particular fix, you shouldn't have to run the HaplotypeCaller step again.
thanks for all the members in gatk
@Yangyxt --depth-correction-tau parameter sets the precision for the global read depth latent variable in our model, whose mean is estimated by DetermineContigPloidy tool and passed to GermlineCNVCaller. You can see a detailed description of the model here (\sigma_s is the variable you're looking for): https://github.com/broadinstitute/gatk/blob/master/docs/CNV/germline-cnv-caller-model.pdf
With regards to detecting small CNVs - is intrinsically hard for single exon event and currently we don't achieve high sensitivity on those. Decreasing --cnv-coherence-length and increasing --p-alt would bump up sensitivity for single exon events, however specificity for all events would take a hit.
Relating to that, we recommend filtering on the QS value in the segments VCF, in particular QS>=50 for duplications, and QS>100 for deletions.
Let me know if that helps!
@Yangyxt I think setting --cnv-coherence-length to 500 is a little extreme. That parameter determines the typical scale of CNV events (it's not however equal to that scale), so the smaller the parameter value the less "sticky" copy number events will be and the faster HMM will forget about them. However, that doesn't mean you shouldn't experiment with different values and see what scales work best for your data.
With regards to CASE vs COHORT mode - COHORT mode actually included the full model that consists of denoising and calling. So if your number of samples is small ( <200) you will not need to use the CASE mode. If you do have a large cohort you would need to first train the model using subset of your samples in the COHORT mode, and then use that model to call the rest of the samples in CASE mode. You don't need to tune any additional parameters for CASE mode, however you optionally can tune all parameters that pertain to calling(as opposed to denoising).
Let me know if you have any specific questions about the parameters.