Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Does my GenotypeGVCFs progress report indicate problems with the results?

ianwianw Bangor, UKMember

Hi GATK Team,

I have a few quick queries about my GenotypeGVCFs output, or rather the progress report generated. I'm concerned that it suggests all is not well! I ran GenotypeGVCFs across 8 threads for a 5-scaffold interval of a cichlid genome. The 233 input genomes had been grouped into 10 sets using CombineGVCFs. My concerns are:

1) The first measure of completeness in my progress report is 14.9%. From there, progress was mostly steady (0.1 - 0.5% between reports) but it did occasionally skip 4-5% at a time between lines. The generated vcf file contains lines for positions prior to the first one reported, so does this just mean that the program had periods of relative super-efficiency or does it suggest trouble?

2) 500 positions were met with the warning: 'GenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location ...'. Having looked at 2 of these positions, it appears that they are recorded as repeats. As this is 500 sites in just 1/16th of a genome, I thought it best to check that this is just the genome rather than a problem with the program?

3) Finally, such repeats' warnings were reported out of order in the progress report. That is, their genomic position was not inbetween the entries above and below them. Is this cause for concern?

Thank you for your time (and to Geraldine for advising me on how to get GenotypeGVCFs running for gzipped files in the first place!).

Best wishes,

Ian

Best Answer

Answers

  • ianwianw Bangor, UKMember

    One further thing - GenotypeGVCFs reached 100% after 4 hours but took another 2.5hrs to actually finish running. Can I ask if this is expected, please? Apologies for the list of questions - I just want to make sure that I understand the results and that they are accurate :smile:

  • ianwianw Bangor, UKMember

    Hi Geraldine,

    Thank you again for your prompt reply. I appreciate you taking the time to explain everything to me. As usual, top notch reply :smiley:

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks Ian, that's what we're here for :)

Sign In or Register to comment.