UnifiedGenotyper Bug

When using UnifiedGenotyper for 12 BAM files I am getting a strange bug where about 400 lines at random have buggy columns such as ./.:.:2 or ./.:.:1 instead of just ./.

This creates problems for loading the VCF into programs like VarSifter to view the variants. Once these lines are fixed the VCFs load fine. It seems to happen at random and it happens in different columns so there does not appear to be one bad sample. Additionally, I ran HaplotypeCaller on these same 12 BAM files and did not run into any difficulties. Any idea why this is happening? I am using the same reference file, dbsnp file, and interval list I always use. I have tried to run UnifiedGenotyper 3 times now and have gotten the bug each time.

I am fairly new to the world of GATK and appreciate your patience!

Thanks so much.


  • pdexheimerpdexheimer Member ✭✭✭✭

    What you're describing is a valid VCF file. From the spec:

    If any of the fields is missing, it is replaced with the missing value. For example if the FORMAT is GT:GQ:DP:HQ then 0|0:.:23:23,34 indicates that GQ is missing. Trailing fields can be dropped (with the exception of the GT field, which should always be present if specified in the FORMAT field).

    So in this case you have a sample that has a depth of 1 or 2, but no genotype call or per-allele depths. IIRC, the AD field is filtered but the DP field is not, so I think these are very low-quality reads that were removed from consideration - which is why there's no call.

