Difference between vcf directly generated by HC and vcf generated from GenotypeGVCFs
I have three general questions about using HaplotypeCaller (I know I could have tested by myself, but I figured it might be reliable to get some answer from people who are developing the tool):
- For single sample analysis, is the vcf generated directly from HC the same as the vcf generated using GenotypeGVCFs on the gvcf generated from HC?
- For multi-sample analysis, in terms of speed, how is the performance of running GenotypeGVCFs on each gvcf, compared with combining all gvcfs to run joint-calling, assuming we can get all gvcfs in parallel (say for 500 samples)?
- It seems the gvcf can be generated in two modes,
-ERC BP_RESOLUTION. How different is the one generated using
-ERC BP_RESOLUTIONdifferent from a vcf with all variant calls, reference calls and missing calls? And considering the size of the file, say for NA12878 whole genome, how different it is comparing the gvcf from
-ERC GVCFand the one from
Thank you very much for you attention and any information from you will be highly appreciated.