Long REF but not an INDEL?

rfuentesrfuentes PhilippinesMember

I'm getting the output from GATK
ChrSy 198904 . C . 44.99 LowQual AN=2;DP=8;MQ=39.35;MQ0=0 GT:DP 0/0:8
ChrSy 198904 . CGTCCGATATTTGCGAAATATCG . Infinity . DP=8;MQ=39.35;MQ0=0 GT ./.

ChrSy 464065 . A . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37
ChrSy 464066 . G . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37
ChrSy 464066 . GTT . 60.03 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:28
ChrSy 464069 . T . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37

Why do positions 198904 and 464066 have multiple-base REF but ALT="."? Why did GATK merge them in
a single line unlike other positions_(emit-all sites VCF)_ and still have a duplicated position preceding it?
I thought only indel has an anchor base(position before the event)?
Does ./. mean "same with the reference"?

Thank you!

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfuentes
    Hi,

    Can you please confirm that you are using the latest version of GATK? Also, please post the exact command you used to get this result.

    Thanks,
    Sheila

  • rfuentesrfuentes PhilippinesMember

    Hi Shiela,

    java -Xmx8g -jar /software/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /reference/japonica/reference.fa -I /output/filename.merged.bam -o filename.merged.vcf -glm BOTH -mbq 20 --genotyping_mode DISCOVERY -out_mode EMIT_ALL_SITES

    Also, please explain what QUAL=Infinity with GT="./." mean. How should I interpret it?
    Thank you!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @rfuentes I'm a fellow user. What is the reason you use EMIT_ALL_SITES? The use of it explains what you are seeing. Have you tried without it?

  • rfuentesrfuentes PhilippinesMember
    edited January 2015

    Hi Tommy,

    We included it to generate the consensus sequence of the whole genome. This project involves 3k rice genomes.
    I expect that EMIT_ALL_SITES will give each position as a separate line and not concatenated in a single REF.
    In my VCF, I found these concatenated string but there are also single position given per line. When does GATK
    combine or separate positions aside from INDELS?

    Thanks!

    roven

  • rfuentesrfuentes PhilippinesMember

    @tommycarstensen & @Sheila I still don't understand why some positions(non variant) are concatenated in a single line while others are not. This is important for us because we are parsing multiple(3k) VCFs to generate a universe of variants. Thank you!

  • rfuentesrfuentes PhilippinesMember

    Hi @Geraldine_VdAuwera
    I saw you answered a forum question which is almost the same as I posted above but I still don't
    fully understand the event indicated in each line. Why are they concatenated and not
    presented as 1 line per position if they are not indel? If pos 464066 is an indel, does it mean
    that all characters in REF are deleted? How about pos 198904 with QUAL=Infinity, what is its
    genotype?
    Thank you!

Sign In or Register to comment.