It looks like you're new here. If you want to get involved, click one of these buttons!
Hello, I have an alignment with 140 reference reads (Ref Base = C) and 10 variant Reads (Var Base = T) at locus: Chr17:7578406. When I use the "EMIT_ALL_SITES" mode, the UnifiedGenotyper generates the following output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT z
17 7578406 rs28934578 C A 0 LowQual AC=0;AF=0.00;AN=2;DB;DP=150;Dels=0.00;FS=0.000;HaplotypeScore=11.1188;MLEAC=0;MLEAF=0.00;MQ=38.26;MQ0=0 GT:AD:DP:GQ:PL 0/0:140,0:150:99:0,361,4503
My questions are:
1) In ALT column, why is the base "A" being shown? Is it a randomly selected base when no SNP is identified at that position?
2) The ID column shows dbSNP record rs28934578 which is a C>T mutation (which is what my data has). Why is the dbSNP records for C>T mutation in the output when no variant is identified at this position (or C>A variant is shown?). Does this imply that ID column shows ALL dbSNP records at that position rather than a dbSNP record of the identified variant?
3) Is there a document that details the VCF output when EMIT_ALL_VARIANTS is used so I could understand the output vcf?
Comments
There is in fact a SNP identified at that position; but it is low quality, presumably under the emission quality threshold, which is why it doesn't get emitted when the tool is run in normal mode. The output vcf is exactly the same format as usual.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi,
Maybe I did'nt ask the question correctly. Since the data contains absolutely no reads with "A" base for positions where it is identifying an "A" ALT variant, I don't see how a SNP (e.g. G>A at position 7578402) is being emitted even if it is at a low confidence. Below is the output from an example. For the following command:
java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.2-15-g9214b2f/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I NG.CL316.ordered.sorted.bam -o temp.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0.0 -dcov 5000 -L "17:7,578,400-7,578,410" -out_mode EMIT_ALL_SITES -contamination 0.0
below is a subsection of the output:
Position 7578406 shows a C>T SNP which is Okay. However at neighboring sites, the positions 7578401 to 7578403, 7578405, and 7578407 all show an n>A SNP; and position 7578404 shows an A>C SNP. If we look at the IGV screenshot(attached) for position 7578407, it show no read with base A (the pointed position in attached file). Then why is this position being called a A ALT variant. So I am wondering that since the stand_emit_conf is 0 at the above sites and I am using EMIT_ALL_SITES, and if a site is not callable, does the vcf output show a randomly selected variant?
Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes, that's right: the genotyper is assigning a "random" alternate allele just so that it can create genotype likelihoods for the site.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks. Is this new change in Version 2.2? Since in version 2.0, with emit_all_sites, I would have seen a "." in the ALT column with a Qual score. Could you explain why this change was made?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •