The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!
I have the following problem:
I am evaluating genotype concordance using:
-T VariantEval --evalModule GenotypeConcordance -comp ref.vcf -eval sample1.observed.vcf
If I use a reference genotype file with multiple samples in it where one of the genotype columns is NA1234 (the sample in question), then the sensitivity for all SNP types (HOM_REF,HET,HOM_VAR) decreases drastically. This is because the GATK gets confused when there is more than one sample in the reference file. I know this because if I use a reference genotype file (ref.vcf) with only a single hapmap sample (NA1234) everything works fine and sensitivity is good. So this is not a detection problem is a problem when SNPs are being compared against the reference.
I tried passing the sample name using the --sample parameter for -T VariantEval, but this does not work either (sensitivity is still way off).
In previous versions of the GATK this was done automatically where genotypes where compared based on the sample name within the detection vcf file (sample1.observed.vcf ) vs the ref.vcf file without having to specify the sample name explicitly.
How can I avoid this problem? I want to have a master reference genotype file with multiple samples that I can use for different samples.
I am using GATK version v1.6