The latest bioconda for GATK4.1.4 installs R packages including "ggplot2". AnalyzeCovariates now works w/o errors.
conda install -c bioconda gatk4
I do not know how much you know about docker and the things, but docker is a way to make software portable between different environments.
GATK have their own docker provided here
and there is a startup guide for gatk here
hope that helps you
How do I generate this table? What does the values mean?
Thanks @mack812 ! The developer I am debugging with is out of the office most of this week, so we will try to get back to you by the end of the week!
Hi @mack812, one thought is that the variant could be getting filtered out by the germline resource. However, I did a quick search for the variant in the gnomad browser, and nothing came up.
Can you provide a sliver of the input bam (1000 bp on each side of this locus) so that we can debug? Here are instructions on how to provide us files.
Thanks for sharing this info, @micknudsen
MergeBamAlignment uses BestMapqPrimaryAlignmentSelectionStrategy to choose which alignment to use. For paired-end reads, it chooses the alignment that gives the best sum of mapping qualities. In the event that there is a tie, one alignment is chosen randomly (with a fixed seed)
@Yangyxt I think setting --cnv-coherence-length to 500 is a little extreme. That parameter determines the typical scale of CNV events (it's not however equal to that scale), so the smaller the parameter value the less "sticky" copy number events will be and the faster HMM will forget about them. However, that doesn't mean you shouldn't experiment with different values and see what scales work best for your data.
With regards to CASE vs COHORT mode - COHORT mode actually included the full model that consists of denoising and calling. So if your number of samples is small ( <200) you will not need to use the CASE mode. If you do have a large cohort you would need to first train the model using subset of your samples in the COHORT mode, and then use that model to call the rest of the samples in CASE mode. You don't need to tune any additional parameters for CASE mode, however you optionally can tune all parameters that pertain to calling(as opposed to denoising).
Let me know if you have any specific questions about the parameters.
@Yangyxt --depth-correction-tau parameter sets the precision for the global read depth latent variable in our model, whose mean is estimated by DetermineContigPloidy tool and passed to GermlineCNVCaller. You can see a detailed description of the model here (\sigma_s is the variable you're looking for): https://github.com/broadinstitute/gatk/blob/master/docs/CNV/germline-cnv-caller-model.pdf
With regards to detecting small CNVs - is intrinsically hard for single exon event and currently we don't achieve high sensitivity on those. Decreasing --cnv-coherence-length and increasing --p-alt would bump up sensitivity for single exon events, however specificity for all events would take a hit.
Relating to that, we recommend filtering on the QS value in the segments VCF, in particular QS>=50 for duplications, and QS>100 for deletions.
Let me know if that helps!
You need to check with your capture kit manufacturers support files for the proper bed file.