Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

VariantRecalibrator across multiple VCFs with identical positions but different annotations

tommycarstensentommycarstensen Posts: 61Member
edited April 2013 in Ask the GATK team

I have a set of VCFs with identical positions in them:

VCF1: 1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP

VCF2: 1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP

VCF3: 1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP

VCF4: 1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP

If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?

I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.

Post edited by tommycarstensen on

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,877 admin
    Answer ✓

    I see. Then it depends how you want to proceed with your analysis; if you want the various sample calls for the same sites to be treated together, and have results output in a single VCF, then you have to use CombineVariants to merge them first. However, if you're happy having them be processed as separate variants and have the outputs in separate VCFs, then you can pass in separate files.

    Geraldine Van der Auwera, PhD

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,877Administrator, GATK Developer admin

    No worries, your question is fine. We'll take pretty much anything that is related to GATK, and we're more than happy to clarify the Best Practices if it can help people use the tools correctly.

    To actually answer your question -- can you first tell me whether those variants derive from the same data (same sample) or from different ones?

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen Posts: 61Member
    edited April 2013

    I should have clarified. The samples in each of the 4 VCFs are unrelated; i.e. they are derived from different BAMs originating from different populations.

    All 4 VCFs contain calls at the same positions, because I specified an interval list and used EMIT_ALL_SITES, when calling with UnifiedGenotyper. I called the 4 populations separately thinking that would be the best approach.

    I also checked the VariantRecalibrator.java source code briefly, but I couldn't quite find the answer to my question.

    Post edited by tommycarstensen on
  • tommycarstensentommycarstensen Posts: 61Member

    Thank you Geraldine. I don't want the identical positions processed as separate sites. Hence I am taking the route of using CombineVariants followed by VariantAnnotator. Thank you for confirming my choice/approach to be the right one in this case.

Sign In or Register to comment.