The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# PL for Genotype?

College Station, TXMember Posts: 2

Hi -- I'm trying out GATK for the first time, but seeing some "confusing" results.
I've just run what seems to be an essentially standard unifiedGenotyper on two sorted bam files (single samples P1003 and 03090), and gotten what appears to be heavily prior/other parameter driven results. Here is the strangeness I'm seeing at, e.g., Chr1 811659.

For the first sample:
A called homozygous non ref (1/1)
based on ~40ref ~10alt bases
with PL favoring non ref or het call :207,21,0

For the second sample:
A called het (0/1)
based on ~44ref ~6alt bases
with strong PL het evidence: 91,0,154

The behavior doesn't seem consistent, or at the very least, it seems quite sensitive to "small" data fluctuations.
At any rate, I don't find the calls particularly appealing.

I haven't had any luck looking through the documentation to see what exactly the PL field is telling me (model, prior(?), etc.) so that I can work out why the numbers come out the way they do (i.e., whether or not they're right/I like what they're trying to compute).
Can anyone point me in the right direction?
It's quite possible that I'm not using some of the command line parameters correctly, but I think I'm just using standard defaults and I don't see any other parameters that would seem to be crucial to these results.

Anyway, any help to expedite my learning curve here would be much appreciated.

Thanks!
Scott

grep "^#" -v P1003_TruSeq_1_ATCACG_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 178.80 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.733;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=11.84;MQ0=28;MQRankSum=3.665;QD=3.58;ReadPosRankSum=1.273 GT:AD:DP:GQ:PL 1/1:40,10:48:21:207,21,0

grep "^#" -v 03090_F2_TruSeq_8_ACTTGA_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 62.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.672;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=14.29;MQ0=29;MQRankSum=1.567;QD=1.26;ReadPosRankSum=-1.746 GT:AD:DP:GQ:PL 0/1:44,6:48:91:91,0,154

for f in P1003_TruSeq_1_ATCACG_R1.sorted.bam 03090_F2_TruSeq_8_ACTTGA_R1.sorted.bam
do
ff=${f%.sorted.bam} echo$ff
java -jar /usr/local/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \
--validation_strictness LENIENT \
-nt 4 \
-R all.fasta \
-T UnifiedGenotyper \
-I $ff.sorted.bam \ -o$ff.gatk.raw.vcf \
-L potential.variants.intervals \
--output_mode EMIT_ALL_SITES &
done

Tagged: