The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

# EMIT_ALL_SITES in UnifiedGenotyper

Member
edited January 2013

Hello,
I have an alignment with 140 reference reads (Ref Base = C) and 10 variant Reads (Var Base = T) at locus: Chr17:7578406. When I use the "EMIT_ALL_SITES" mode, the UnifiedGenotyper generates the following output:

<a rel="nofollow" href="/search?Search=%23CHROM&amp;Mode=like">#CHROM</a>  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  z
17      7578406 rs28934578      C       A       0       LowQual AC=0;AF=0.00;AN=2;DB;DP=150;Dels=0.00;FS=0.000;HaplotypeScore=11.1188;MLEAC=0;MLEAF=0.00;MQ=38.26;MQ0=0      GT:AD:DP:GQ:PL  0/0:140,0:150:99:0,361,4503


My questions are:

1) In ALT column, why is the base "A" being shown? Is it a randomly selected base when no SNP is identified at that position?
2) The ID column shows dbSNP record rs28934578 which is a C>T mutation (which is what my data has). Why is the dbSNP records for C>T mutation in the output when no variant is identified at this position (or C>A variant is shown?). Does this imply that ID column shows ALL dbSNP records at that position rather than a dbSNP record of the identified variant?
3) Is there a document that details the VCF output when EMIT_ALL_VARIANTS is used so I could understand the output vcf?

Post edited by Geraldine_VdAuwera on
Tagged:

There is in fact a SNP identified at that position; but it is low quality, presumably under the emission quality threshold, which is why it doesn't get emitted when the tool is run in normal mode. The output vcf is exactly the same format as usual.

• Member

Hi,

Maybe I did'nt ask the question correctly.
Since the data contains absolutely no reads with "A" base for positions where it is identifying an "A" ALT variant, I don't see how a SNP (e.g. G>A at position 7578402) is being emitted even if it is at a low confidence. Below is the output from an example. For the following command:

java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.2-15-g9214b2f/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I NG.CL316.ordered.sorted.bam -o temp.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0.0 -dcov 5000 -L "17:7,578,400-7,578,410" -out_mode EMIT_ALL_SITES -contamination 0.0

below is a subsection of the output:

<a rel="nofollow" href="/search?Search=%23CHROM&amp;Mode=like">#CHROM</a>  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  z
17      7578401 rs147002414     G       A       0       LowQual AC=0;AF=0.00;AN=2;DB;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=3.7458;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0        GT:AD:DP:GQ:PL  0/0:12,0:12:18:0,18,172
17      7578402 .       G       A       0       LowQual AC=0;AF=0.00;AN=2;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=5.5344;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0  GT:AD:DP:GQ:PL   0/0:12,0:12:18:0,18,172
17      7578403 .       C       A       0       LowQual AC=0;AF=0.00;AN=2;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=4.5411;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0  GT:AD:DP:GQ:PL   0/0:12,0:12:30:0,30,360
17      7578404 .       A       C       0       LowQual AC=0;AF=0.00;AN=2;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=3.5437;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0  GT:AD:DP:GQ:PL   0/0:12,0:12:30:0,30,360
17      7578405 .       G       A       0       LowQual AC=0;AF=0.00;AN=2;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=4.5436;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0  GT:AD:DP:GQ:PL   0/0:12,0:12:30:0,30,364
17      7578407 rs138729528     G       A       0       LowQual AC=0;AF=0.00;AN=2;DB;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=3.7883;MLEAC=0;MLEAF=0.00;MQ=38.06;MQ0=0        GT:AD:DP:GQ:PL  0/0:12,0:12:30:0,30,367
`

Position 7578406 shows a C>T SNP which is Okay. However at neighboring sites, the positions 7578401 to 7578403, 7578405, and 7578407 all show an n>A SNP; and position 7578404 shows an A>C SNP.
If we look at the IGV screenshot(attached) for position 7578407, it show no read with base A (the pointed position in attached file). Then why is this position being called a A ALT variant.
So I am wondering that since the stand_emit_conf is 0 at the above sites and I am using EMIT_ALL_SITES, and if a site is not callable, does the vcf output show a randomly selected variant?

Thanks