The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

PL for Genotype?

schwarls37schwarls37 College Station, TXMember Posts: 2

Hi -- I'm trying out GATK for the first time, but seeing some "confusing" results.
I've just run what seems to be an essentially standard unifiedGenotyper on two sorted bam files (single samples P1003 and 03090), and gotten what appears to be heavily prior/other parameter driven results. Here is the strangeness I'm seeing at, e.g., Chr1 811659.

For the first sample:
A called homozygous non ref (1/1)
based on ~40ref ~10alt bases
with PL favoring non ref or het call :207,21,0

For the second sample:
A called het (0/1)
based on ~44ref ~6alt bases
with strong PL het evidence: 91,0,154

The behavior doesn't seem consistent, or at the very least, it seems quite sensitive to "small" data fluctuations.
At any rate, I don't find the calls particularly appealing.

I haven't had any luck looking through the documentation to see what exactly the PL field is telling me (model, prior(?), etc.) so that I can work out why the numbers come out the way they do (i.e., whether or not they're right/I like what they're trying to compute).
Can anyone point me in the right direction?
It's quite possible that I'm not using some of the command line parameters correctly, but I think I'm just using standard defaults and I don't see any other parameters that would seem to be crucial to these results.

Anyway, any help to expedite my learning curve here would be much appreciated.


grep "^#" -v P1003_TruSeq_1_ATCACG_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 178.80 . AC=2;AF=1.00;AN=2;BaseQRankSum=0.733;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=11.84;MQ0=28;MQRankSum=3.665;QD=3.58;ReadPosRankSum=1.273 GT:AD:DP:GQ:PL 1/1:40,10:48:21:207,21,0

grep "^#" -v 03090_F2_TruSeq_8_ACTTGA_R1.gatk.raw.vcf | head -3063 | tail -1
Chr1 811659 . C G 62.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.672;DP=50;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=14.29;MQ0=29;MQRankSum=1.567;QD=1.26;ReadPosRankSum=-1.746 GT:AD:DP:GQ:PL 0/1:44,6:48:91:91,0,154

for f in P1003_TruSeq_1_ATCACG_R1.sorted.bam 03090_F2_TruSeq_8_ACTTGA_R1.sorted.bam
echo $ff
java -jar /usr/local/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar \
--validation_strictness LENIENT \
-nt 4 \
-R all.fasta \
-T UnifiedGenotyper \
-I $ff.sorted.bam \
-o $ff.gatk.raw.vcf \
-L potential.variants.intervals \
--output_mode EMIT_ALL_SITES &


  • pdexheimerpdexheimer Member, Dev Posts: 544 ✭✭✭✭

    PL is the posterior probability.

    Neither of these are particularly strong variants - relatively low Quality, low QD, extremely low MQ and high MQ0. I don't think either of these will survive any sort of filter. There's a FAQ article about something like "Why wasn't my variant called?" - that will help you with some of your learning curve. I'd also strongly recommend reading through the best practices - it looks like you're just running UG on a raw BAM file, one sample at a time. You'll definitely want to spend some time on the pre-processing steps, and probably do a joint calling run

  • schwarls37schwarls37 College Station, TXMember Posts: 2

    Thank you very much, pdex! Okay yes, I definitely will go through these resources: they sound like exactly where I need to start.
    I kind of just rushed into things here. And while I did attempt to acclimate myself to the gatk environment, it sounds like I'm just not there yet. Okay, Cheers! That's perfect!

Sign In or Register to comment.