Does my GenotypeGVCFs progress report indicate problems with the results?
Hi GATK Team,
I have a few quick queries about my GenotypeGVCFs output, or rather the progress report generated. I'm concerned that it suggests all is not well! I ran GenotypeGVCFs across 8 threads for a 5-scaffold interval of a cichlid genome. The 233 input genomes had been grouped into 10 sets using CombineGVCFs. My concerns are:
1) The first measure of completeness in my progress report is 14.9%. From there, progress was mostly steady (0.1 - 0.5% between reports) but it did occasionally skip 4-5% at a time between lines. The generated vcf file contains lines for positions prior to the first one reported, so does this just mean that the program had periods of relative super-efficiency or does it suggest trouble?
2) 500 positions were met with the warning: 'GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location ...'. Having looked at 2 of these positions, it appears that they are recorded as repeats. As this is 500 sites in just 1/16th of a genome, I thought it best to check that this is just the genome rather than a problem with the program?
3) Finally, such repeats' warnings were reported out of order in the progress report. That is, their genomic position was not inbetween the entries above and below them. Is this cause for concern?
Thank you for your time (and to Geraldine for advising me on how to get GenotypeGVCFs running for gzipped files in the first place!).