Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SNP genotyping problem, along with AD, DP

Hi,
I used an older version of GATK and did a SNP calling from Haplotype caller (HaplotypeCaller,Version=2.7-2-g6bda569). Now I went back to the vcf file, but noticed some genotyping that I don't understand:0/1:0,13:13:PASS:99:457,0,543, where does the "0/1" come from?

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited September 2015

    @yg1 0/1 is your GT and means the sample is heterozygous at that record. Check out the VCF file format specifications.

    Oh, I just realised your question might be poorly phrased and you might be asking, how can I have a GT of 0/1, when my AD is 0,13? This post by @Geraldine_VdAuwera might explain things:
    http://gatkforums.broadinstitute.org/discussion/6005/allele-depth-ad-is-lower-than-expected

    I hope that answers your question.

    P.S. GATK2.7 is approximately the same age as Stone Henge at this point in time...

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @yg1
    Hi,

    Please do upgrade to the latest version for the best results. https://www.broadinstitute.org/gatk/download/

    -Sheila

  • yg1yg1 Member
    edited September 2015

    Pardon me if I didn't explain it clearly in my last post, but I was really frustrated. The reality is , I have read the "uninformative reads " post from the same link (http://gatkforums.broadinstitute.org/discussion/6005/allele-depth-ad-is-lower-than-expected), but my situation here seems somehow different.Below I got an 0/1 from 0,13, but the AD and DP was also 13, so there was NO uninformative reads. Because uninformative reads will be counted towards the DP, not AD. Here DP=AD (i.e. 13=13). Is this clear enough?
    GT:AD:DP:GQ:PL: 0/1:0,13:13:PASS:99:457,0,543

    As I said it was a GATK calling I did LONG time ago using the latest version at that time, I can't re-do the analysis every time it upgraded.
    In my opinion, the older version and the newer version should be compatible, not mutual exclusive.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @yg1
    Hi,

    I can understand your frustration. However, with each release, there are new features and improved tools. The developers work really hard to make the most accurate tools for variant calling. Since version 2.7, I am sure there have been many upgrades to Haplotype Caller, perhaps maybe even a few bug fixes or new features that may make your call more understandable.

    Can you try using the latest version to call that region again? If the call is still confusing, we can go from there.

    -Sheila

  • yg1yg1 Member
    edited September 2015

    Could you face the question why I got such GT:AD:DP:GQ:PL: 0/1:0,13:13:PASS:99:457,0,543 ?
    1) Is this really from the uninformative reads?
    2)Any possible reasons that, this could be associated with the older version?
    3)Any similar report(s) before? Have you noticed similar problems reported by other users before?

    If this was a problem not identified from the older version, I highly doubted it improved in the newer version.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @yg1

    This is not a compatibility issue. The point here is that we cannot possibly go back and justify every inconsistency in callsets made with much earlier versions. The reason we have new versions at all is because we have since improved the program and fixed bugs. It is very possible that the inconsistency you found in your old callset was due to a bug. If you run the latest version on this region, you can find out if the inconsistency persists or not. Or, if you cannot rerun the analysis, then you write this off as a site where data is inconclusive. This will happen with every program that performs technically complex analysis like this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    As for previous reports of this problem, feel free to search the forum; that's what it's for.

  • yg1yg1 Member
    edited September 2015

    I had of course spent long time searching the similar issue in the forum before I posted the question. I didn't find any similar reports, that's why I asked if YOU have noticed the similar problems, as administrator. So I got upset with your answer::: "As for previous reports of this problem, feel free to search the forum; that's what it's for."

    From what it sounds, it's most likely a bug with unknown reasons back to the Stone Age (2014) (not sure if it's associated (or not) with the older version). The only comments I wanted to leave is : re-run the analysis, especially a large scale analysis with the updated version is not always the golden answer. Especially with gatk who updates it's version so frequently. i.e. I really can't re-run my analysis EVERY TIME a newer version comes out, as a way to fix the unknown problems.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Alright, I'm sorry I snapped at you. We just get a lot of people who don't bother searching first, which takes time away from writing documentation that helps everyone. I'm glad to hear you made the effort of looking first.

    When we see recurring problems we try to document them but sometimes the same symptoms have different causes depending on the dataset, and we can't always nail down why.

    For a case like this you could just run on the individual site as a smell test, that's what we mean -- not redo the entire analysis until there's reason to believe it's worth it.

Sign In or Register to comment.