How to filter jointly called variants
I am genotyping a couple of hundred haploid yeast genomes, and did the following per sample:
map with bwa
call variants using HaplotypeCaller
apply VariantFiltration using recommended filters
selecting passing variants, also requiring minimum coverage, and a minimum fraction of reads supporting variants
used these variants to recalibrate quality scores
called variants on the recalibrated data, in GVCF mode
Then, I combined all vcf files into a giant, 200 sample gvcf file, then:
perform joint calling on that file, using GenotypeGVCFs
I understand that I can't do variant recalibration, because I don't have a sufficient number of variants for training. What I don't understand is:
a) will the variants be 'better' - i.e. more accurate - from the joint calling than they were from the single sample calling (i.e. will variant metrics be different?), or was the joint calling superfluous as I am unable to do variant recalibration?
b) how do I now proceed with this giant gvcf file to apply hard filters - I see no documentation as to how to perform filtering on all samples, or even each sample from this multi-sample gvcf file.