Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

VariantRecalibrator across multiple VCFs with identical positions but different annotations

tommycarstensentommycarstensen United KingdomMember
edited April 2013 in Ask the GATK team

I have a set of VCFs with identical positions in them:

VCF1:
1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP

VCF2:
1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP

VCF3:
1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP

VCF4:
1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP

If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?

I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    I see. Then it depends how you want to proceed with your analysis; if you want the various sample calls for the same sites to be treated together, and have results output in a single VCF, then you have to use CombineVariants to merge them first. However, if you're happy having them be processed as separate variants and have the outputs in separate VCFs, then you can pass in separate files.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    No worries, your question is fine. We'll take pretty much anything that is related to GATK, and we're more than happy to clarify the Best Practices if it can help people use the tools correctly.

    To actually answer your question -- can you first tell me whether those variants derive from the same data (same sample) or from different ones?

  • tommycarstensentommycarstensen United KingdomMember
    edited April 2013

    I should have clarified. The samples in each of the 4 VCFs are unrelated; i.e. they are derived from different BAMs originating from different populations.

    All 4 VCFs contain calls at the same positions, because I specified an interval list and used EMIT_ALL_SITES, when calling with UnifiedGenotyper. I called the 4 populations separately thinking that would be the best approach.

    I also checked the VariantRecalibrator.java source code briefly, but I couldn't quite find the answer to my question.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    I see. Then it depends how you want to proceed with your analysis; if you want the various sample calls for the same sites to be treated together, and have results output in a single VCF, then you have to use CombineVariants to merge them first. However, if you're happy having them be processed as separate variants and have the outputs in separate VCFs, then you can pass in separate files.

  • tommycarstensentommycarstensen United KingdomMember

    Thank you Geraldine. I don't want the identical positions processed as separate sites. Hence I am taking the route of using CombineVariants followed by VariantAnnotator. Thank you for confirming my choice/approach to be the right one in this case.

Sign In or Register to comment.