Loss of variants

I am using GenomeAnalysisTK-2.3-9. For performing a GATK Multisample call I have a work flow like this:

(1) Make GATK Multisample call over N samples
(2) Perform Recalibration on the multisample result from previous set
(3) Apply Recalibration
(4) Combine variants from several batches going through step 1 to 3
(5) Genotype call against single samples now using the master VCF from step 4
(6) Finally combining all variants from the single sample calls from step 5

Now, when I check the Master VCF from step 4 and the final result after step 6, I see that some variants are missing in the end result though they passed the recalibration step and having a allele frequency greater than 0.8.

Shouldn't all the variants from step 4 have been included in the final merged VCF in step 6?


  CarneiroCarneiro Charlestown, MAMember

    It's hard to say in which of these steps you 'lost' your variants. It could be that some variants got merged together. When you say missing do you mean the site is not listed at all in the final VCF after combine variants? Do you know for sure that the site was present in the pre-combine vcfs (at least one)?

    Recalibration doesn't remove variants from the VCF, it just marks them as filtered when that's the case.

    You should use Reduce Reads in your pipeline and call all these samples together, without batching. Batching is bad!

