Since GATK4 we do not do the indel realignment step as it is intrinsically done by the variant caller tools and hence redundant.
If you are asking for how to set ploidy for the GenotypeGVCFs tool, there is no need to do so. Please take a look at the note about ploidy here
Special note on ploidy This tool is able to handle any ploidy (or mix of ploidies) intelligently; there is no need to specify ploidy for non-diploid organisms.
Does that answer the question?
Hi @oneillkza -
Any files which start in the same directory (or to be technical, with the same path prefix in cloud storage systems) should end up in the same directory within the inputs directory in Cromwell. The reasoning for using different directories is that a file in different origin directories might have clashing names, so let's not risk putting them together in the execution directory on the VM.
Is that what you're seeing? In other words, for the case where the bam and bai are ending up split apart, are they coming from the same original directory or are they in different directories?
If they start off apart, then moving them together within the task might be the "correct" thing to do here to guarantee that they're co-located.
My email response to Zhuqing below was incorrect. The documentation from 2012 is still correct with respect to how the bases are marked.
In the various genome masks, bases marked with a "1" value are masked out (not used), bases with a "0" values are included. Thus, for the alignability masks (svmasks) the uniquely alignable bases are indicated with "0" and the non-unique bases with "1". For the other masks, for example the gcmask (formerly called the cn2 mask), bases in the the more well-behaved parts of the genome are marked as "0", other bases as "1", etc.
Sorry about the confusion.
Yes, it is possible. Please look at the information for the resource bundle and the Broad data sets
One thing i noticed is the large amount of scattering taking place during an individual run. The workflow bases the number of times the haplotypecaller task is scattered by the number of intervals in the provided interval list, I can't view your interval list but I assume it has over 3000 intervals. This is one way of scattering the haplotypecaller task but in this case it can be overwhelming if the thousands of intervals are used to spin up thousands of jobs. Another way of scattering the task is to evenly divide intervals into a set number, this is done in the five-dollar-genome-analysis-pipeline using Utils.ScatterIntervalList task. You could incorporate this script into to your method, this way you aren't running 3000+ scatter jobs.
You can find information related to differences in releases here: https://github.com/broadinstitute/gatk/releases
@breardon We pushed a release today with retry fixes for a google issue we started experiencing since Jan 8th. The google issue may lead to the UI reporting that the workspace bucket is unreadable or there may be workflow failures - potentially like the one that you and the other users are seeing. Refreshing or retrying should help but eventually we are waiting for google to resolve this issue. However, I will double check with the team that this issue is actually related to this problem and not something else and get back to you.
ClippingRankSumTest tag is present in the latest versions and you
can find more information related to that here: https://software.broadinstitute.org/gatk/documentation/tooldocs/22.214.171.124/org_broadinstitute_hellbender_tools_walkers_annotator_ClippingRankSumTest.php
@ElenaGrassi It's true that a matched PoN is more effective for technical artifacts, but, first, a surprising large amount of artifacts are common to many library creation protocols, and second, an unmatched PoN can still be quite effective for catching mapping artifacts.
126.96.36.199 is the latest release, and 4.1 is coming this month. We haven't updated the Firecloud featured workspace in a few months -- it's still on 4.0.8 -- but will update for 4.1.
The hg38 gnomAD you downloaded is the best we have at the moment. The bug fixes were to patch the downstream consequences of the liftover, namely population allele frequencies of 0, 1, and . (missing) that don't appear in the pre-liftover hg37 gnomAD.
I want to add that it's extremely useful to learn about users' pain points. As developers we can all too easily forget the experience of setting things up for the first time, not to mention to hassle of switching to the latest version.