Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Long REF but not an INDEL?

rfuentesrfuentes PhilippinesMember

I'm getting the output from GATK
ChrSy 198904 . C . 44.99 LowQual AN=2;DP=8;MQ=39.35;MQ0=0 GT:DP 0/0:8
ChrSy 198904 . CGTCCGATATTTGCGAAATATCG . Infinity . DP=8;MQ=39.35;MQ0=0 GT ./.

ChrSy 464065 . A . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37
ChrSy 464066 . G . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37
ChrSy 464066 . GTT . 60.03 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:28
ChrSy 464069 . T . 113.99 . AN=2;DP=37;MQ=25.38;MQ0=0 GT:DP 0/0:37

Why do positions 198904 and 464066 have multiple-base REF but ALT="."? Why did GATK merge them in
a single line unlike other positions_(emit-all sites VCF)_ and still have a duplicated position preceding it?
I thought only indel has an anchor base(position before the event)?
Does ./. mean "same with the reference"?

Thank you!

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rfuentes
    Hi,

    Can you please confirm that you are using the latest version of GATK? Also, please post the exact command you used to get this result.

    Thanks,
    Sheila

  • rfuentesrfuentes PhilippinesMember

    Hi Shiela,

    java -Xmx8g -jar /software/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /reference/japonica/reference.fa -I /output/filename.merged.bam -o filename.merged.vcf -glm BOTH -mbq 20 --genotyping_mode DISCOVERY -out_mode EMIT_ALL_SITES

    Also, please explain what QUAL=Infinity with GT="./." mean. How should I interpret it?
    Thank you!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @rfuentes I'm a fellow user. What is the reason you use EMIT_ALL_SITES? The use of it explains what you are seeing. Have you tried without it?

  • rfuentesrfuentes PhilippinesMember
    edited January 2015

    Hi Tommy,

    We included it to generate the consensus sequence of the whole genome. This project involves 3k rice genomes.
    I expect that EMIT_ALL_SITES will give each position as a separate line and not concatenated in a single REF.
    In my VCF, I found these concatenated string but there are also single position given per line. When does GATK
    combine or separate positions aside from INDELS?

    Thanks!

    roven

  • rfuentesrfuentes PhilippinesMember

    @tommycarstensen & @Sheila I still don't understand why some positions(non variant) are concatenated in a single line while others are not. This is important for us because we are parsing multiple(3k) VCFs to generate a universe of variants. Thank you!

  • rfuentesrfuentes PhilippinesMember

    Hi @Geraldine_VdAuwera
    I saw you answered a forum question which is almost the same as I posted above but I still don't
    fully understand the event indicated in each line. Why are they concatenated and not
    presented as 1 line per position if they are not indel? If pos 464066 is an indel, does it mean
    that all characters in REF are deleted? How about pos 198904 with QUAL=Infinity, what is its
    genotype?
    Thank you!

Sign In or Register to comment.