Using HaplotypeCaller 3.5 vs HaplotypeCaller 4.0 joint calling large cohorts

We've been testing published Broad "production" workflow for paired-end single sample alignment and variant calling ("paired-end single sample alignment and variant calling"). The workflow in the "production" pipeline appears to use HaplotypeCaller from GATK 3.5 while using other components from GATK 4. There is a separately published pipeline here that uses GATK 4's HaplotypeCaller. We have tried running the joint discovery pipeline using both the HC 3.5 and HC 4 gvcfs as inputs, using the default parameters provided in the repositories. However, the results appear to be almost indistinguishable when run on the NIST NA12878 reference sample. This surprised us since the model/parameters have changed between 3.5 and 4.

Question: Why is the Broad "production" pipeline still using HaplotypeCaller 3.5 instead of some 4.x version?
We intend to align and joint-call upwards of 8000 WGS samples on Google Cloud It it recommended to use the output from HaplotypeCaller 3.5 (we were trying to use the Broad "production" pipeline with as few modifications as possible) or to use Haplotypecaller 4 instead before running joint calling?

