UG call combined SNP/INDEL sites in GGA mode?

I have a small UG question to ask and a minor bug to report, but I will give the whole context. I have cohorts at low and high coverage. I will probably call my low coverage data with UG and my high coverage data with HC. I will probably run VQSR per cohort. I will then create a union set of variants and recall all samples with GENOTYPE_GIVEN_ALLELES. I will probably use UG and HC for the low and high coverage recalling, respectively. Prior to generating the union set I will need to merge the UG outputted SNPs and InDels at the same position. I can merge SNPs and InDels with bcftools:

bcftools norm -m +any

I remember from another thread that UG treats SNPs and InDels separately:

http://gatkforums.broadinstitute.org/discussion/3375/combine-snp-and-indel-vcf

But I just checked that UG can do SNP and InDels calls at the same position, when an alleles file with merged SNPs and InDels is provided, when I use GENOTYPE_GIVEN_ALLELES. I expected it to fail, but it seems to work fine. SNPs and InDels are emitted at the same site/record/line. For one record and 3 samples (HOMREF, HET, HOMALT) the genotypes are correct. Can I trust the PLs? Can I also trust QUAL and other calculated values? Or is UG not meant to be used for calling SNPs and InDels at the same site with GGA?

I only had a minor issue despite tabix indexing my bgzipped known alleles file:

##### ERROR MESSAGE: An index is required, but none found., for input source: alleles.vcf.gz

Feel free to delete the original post of this question.

Best Answer

Answers

Sign In or Register to comment.