The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# GATK: basic VCF indel output question

Member Posts: 4
edited May 2013

Hi,

I'm having problems understanding a GATK output VCF. I have read the VCF standard, but I'm obviously missing something.

I /think/ I understand how SNPs and short indels are represented, but clearly I do not. Below is an excerpt that illustrates sites which I do not understand. I suspect it may be something to do with GATK quality filters that I'm not understanding, or something about using EMIT_ALL_SITES...

The excerpt below was generated using

GATK -l INFO -I my.bam -R my.fa -T UnifiedGenotyper -S LENIENT -nt 8 --heterozygosity 0.1 -o test.vcf --genotype_likelihoods_model BOTH --min_base_quality_score 10 --output_mode EMIT_ALL_SITES -ploidy 2

Thanks!

Darren

    CH1 225 .   T   G   12.71   LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=1.978;DP=59;Dels=0.03;FS=0.000;HaplotypeScore=10.2840;MLEAC=1;MLEAF=0.500;MQ=70.25;MQ0=8;MQRankSum=-5.349;QD=0.22;ReadPosRankSum=-3.188 GT:AD:DP:GQ:PL  0/1:41,16:55:20:20,0,1435
CH1 226 .   T   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
CH1 227 .   A   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
CH1 228 .   T   .   121.53  .   AN=2;DP=59;MQ=70.25;MQ0=8   GT:DP   0/0:43
CH1 229 .   A   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:38
CH1 230 .   C   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:38
CH1 231 .   T   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:36
CH1 232 .   G   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:36
CH1 233 .   C   .   115.53  .   AN=2;DP=57;MQ=69.66;MQ0=8   GT:DP   0/0:37
CH1 234 .   A   .   139.53  .   AN=2;DP=70;MQ=59.20;MQ0=14  GT:DP   0/0:63
CH1 235 .   A   .   175.53  .   AN=2;DP=84;MQ=51.67;MQ0=15  GT:DP   0/0:79
CH1 236 .   A   .   175.53  .   AN=2;DP=84;MQ=51.67;MQ0=15  GT:DP   0/0:79
CH1 237 .   T   .   175.53  .   AN=2;DP=85;MQ=51.37;MQ0=16  GT:DP   0/0:80
CH1 238 .   A   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:97
CH1 239 .   A   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:101
CH1 240 .   T   .   175.53  .   AN=2;DP=102;MQ=46.90;MQ0=28 GT:DP   0/0:98
CH1 241 .   A   .   169.53  .   AN=2;DP=108;MQ=44.14;MQ0=29 GT:DP   0/0:107
*    CH1    242 .   T   .   169.53  .   AN=2;DP=109;MQ=43.94;MQ0=29 GT:DP   0/0:103
*    CH1    242 .   T   .   118.27  .   AN=2;DP=109;MQ=43.94;MQ0=29 GT:AD:DP    0/0:27:55
*    CH1    243 .   C   .   172.53  .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:108
*    CH1    243 .   CTTTT   .   118.27  .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:AD:DP    0/0:27:56
CH1 244 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:61
CH1 245 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:53
CH1 246 .   T   .   73.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:41
CH1 247 .   T   .   91.53   .   AN=2;DP=110;MQ=43.76;MQ0=29 GT:DP   0/0:46
CH1 248 .   A   .   172.53  .   AN=2;DP=116;MQ=42.61;MQ0=31 GT:DP   0/0:100
CH1 249 .   A   .   172.53  .   AN=2;DP=116;MQ=42.61;MQ0=31 GT:DP   0/0:100
CH1 250 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:101
CH1 251 .   T   .   169.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:96
CH1 251 .   T   .   118.27  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:AD:DP    0/0:27:56
CH1 252 .   C   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:113
CH1 253 .   C   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:110
CH1 254 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111
CH1 255 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111
CH1 256 .   T   .   172.53  .   AN=2;DP=117;MQ=42.43;MQ0=32 GT:DP   0/0:111
`

Line 1 is a SNP
Lines 14 and 15 are an indel that I do understand
Lines 19 and 20 I do /not/ understand
Lines 21 and 22 I do /not/ understand

Post edited by Carneiro on

• Member Posts: 4

OK, clearly that didn't work. The excerpt is now attached as a file. - apologies!

• Charlestown, MAMember Posts: 274 admin

sorry I had to edit your post to understand what you were asking. Let me rephrase your question here to make sure I understand it now:

You want to know why the lines marked with "*" have duplicated information for the same position ? (the actual event is clear -- a T deletion, or a CTTTTT deletion -- it's the fact that you have multiple events in the same location that seems to be wrong here).

Is this correct?

• Member Posts: 4

Hi!

Thank you for editing the question! You have my incomprehension encapsulated perfectly - and perhaps I should have been able to work it out for myself.

However! I still do not fully understand:

If

• CH1 242 . T . 169.53 . AN=2;DP=109;MQ=43.94;MQ0=29 GT:DP 0/0:103
• CH1 242 . T . 118.27 . AN=2;DP=109;MQ=43.94;MQ0=29 GT:AD:DP 0/0:27:55 - See more at: http://gatkforums.broadinstitute.org

represents a polymorphic indel (T/.) at position 242 (which I should have been able to work out for myself), then I have two supplementary questions:

(1) Why is the genotype field '0/0' rather than '0/1' or '1/1' ?

Since this is the only (diploid) individual in the dataset, for there to be an indel recorded at this site, the indel /must/ be heterozygous or homozygous deletion, as there is no external indel information for these data.

(2) Why is the AD field '27' rather than something like '20,7'? This entry suggests to me that the 'T' call was seen 27 times, and the call seen zero times.

But if the deletion was seen zero times, how come it's there at all as an ALT allele? My first guess is that I don't understand a quality issue (population DP is 109, whereas AD is 27). But there were clearly 55 quality calls of the deleted allele, since individual-level DP=55.

Thanks again for any advice you can give me! it's much appreciated!

Darren

• Member Posts: 4

Great! Thank you for all your help.