My question is if its possible to run --includeNonVariantSites in GATK 4?
Hmm. It looks like it is not available in the latest beta release. I will check with the team and find out when it will be available.
Is there an update on this? Do you plan to bring --includeNonVariantSites back? Our pipeline uses this and we need to modify what we are doing depending on the update. Thanks in advance!
The developers say it is not on their immediate roadmap, but there are plans to implement it. If other users post here and say this is important to them, we can try to prioritize this.
I also find this option important!
Is it currently impossible to output invariant sites from GATK4 using some combination of arguments other than --includeNonVariantSites? I tried using HaplotypeCaller with "--emitRefConfidence bp_resolution" and then GenotypeGVCFs with "--standard-min-confidence-threshold-for-calling 0", but that didn't do it.
Yes, it is currently impossible to get the non-variant sites in GATK4. You can follow the issue here. Apparently, there is a blocker that needs to be taken care of first before this gets fixed. I will let the team know this is a nice feature for at least 4 users.
I'd also like to pipe in to say that this is an important option for my work.
Thank you for letting us know. I made a note in the issue.
This would be a welcome feature for some users studying the Heliconius butterflies. I believe Steven Van Belleghem has been inquiring about this years ago. Having invariant sites is rather important for scaling the population level variability etc.
A question for now: we need to get the invariant sites, but our gVCFs were generated with GATK4. Is it legitimate to then GenotypeGVCFs using v3.7 (in order to call invariants)?
You could try doing that but we recommend you use the latest version of GenotypeGVCFs.
the latest version of GenotypeGVCFs (Version:22.214.171.124) does not include the option --includeNonVariantSites or any equivalent yet though - right?
So, am I right when assuming that for users dependent on the invariant sites (I'm in btw), the approach suggested by @KrzysztofMKozak is the only way to go?
@Kosmas I believe you can get the information using HaplotypeCaller's -ERC BP_RESOLUTION and then using GenotypeGVCF's -allSites mode to genotype every site.