Attention: Want an end-to-end pipelining solution for GATK Best Practices?
getting high quality monomorphic sites from GenotypeGVCFs
I am doing a population genetic analysis and am consequently very interested in obtaining high confidence monomorphic sites.
I used GATK 3.1 HaplotypeCaller in the new incremental variant discovery pipeline and then ran GenotypeGVCFs using the -inv option so as to print non variant sites after combining.
A monomorphic record looks like this:
scaffold_1 202986 . G . . . . GT:AD:DP:PL ./.:81:81:0 ./.:6:12:0 ./.:0:0:0 .....etc
I had a couple of questions:
It seems GT, GQ, and PL is no longer reported after combining for monomorphic sites? is there a reason why? I'm asking because I'd like to be able to distinguish high qual (first genotype) / low qual (second genotype) monomorphic sites so I can change the low qual sites to missing (third genotype).
If getting high confidence monomorphic sites is my goal, would you recommend that I do my own parsing starting with the individual gvcfs rather than using the GenotypeGVCFs walker?
thanks much for your help!
Young Wha Lee