PL for Genotype?
Hi -- I'm trying out GATK for the first time, but seeing some "confusing" results.
I've just run what seems to be an essentially standard unifiedGenotyper on two sorted bam files (single samples P1003 and 03090), and gotten what appears to be heavily prior/other parameter driven results. Here is the strangeness I'm seeing at, e.g., Chr1 811659.
For the first sample:
A called homozygous non ref (1/1)
based on ~40ref ~10alt bases
with PL favoring non ref or het call :207,21,0
For the second sample:
A called het (0/1)
based on ~44ref ~6alt bases
with strong PL het evidence: 91,0,154
The behavior doesn't seem consistent, or at the very least, it seems quite sensitive to "small" data fluctuations.
At any rate, I don't find the calls particularly appealing.
I haven't had any luck looking through the documentation to see what exactly the PL field is telling me (model, prior(?), etc.) so that I can work out why the numbers come out the way they do (i.e., whether or not they're right/I like what they're trying to compute).
Can anyone point me in the right direction?
It's quite possible that I'm not using some of the command line parameters correctly, but I think I'm just using standard defaults and I don't see any other parameters that would seem to be crucial to these results.
Anyway, any help to expedite my learning curve here would be much appreciated.
grep "^#" -v P1003_TruSeq_1_ATCACG_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 178.80 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.733;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=11.84;MQ0=28;MQRankSum=3.665;QD=3.58;ReadPosRankSum=1.273 GT:AD:DP:GQ:PL 1/1:40,10:48:21:207,21,0
grep "^#" -v 03090_F2_TruSeq_8_ACTTGA_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 62.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.672;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=14.29;MQ0=29;MQRankSum=1.567;QD=1.26;ReadPosRankSum=-1.746 GT:AD:DP:GQ:PL 0/1:44,6:48:91:91,0,154
for f in P1003_TruSeq_1_ATCACG_R1.sorted.bam 03090_F2_TruSeq_8_ACTTGA_R1.sorted.bam
java -jar /usr/local/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \
--validation_strictness LENIENT \
-nt 4 \
-R all.fasta \
-T UnifiedGenotyper \
-I $ff.sorted.bam \
-o $ff.gatk.raw.vcf \
-L potential.variants.intervals \
--output_mode EMIT_ALL_SITES &