The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

VariantRecalibrator across multiple VCFs with identical positions but different annotations

tommycarstensentommycarstensen United KingdomMember Posts: 404 ✭✭✭
edited April 2013 in Ask the GATK team

I have a set of VCFs with identical positions in them:

VCF1:
1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP

VCF2:
1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP

VCF3:
1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP

VCF4:
1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP

If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?

I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    Accepted Answer

    I see. Then it depends how you want to proceed with your analysis; if you want the various sample calls for the same sites to be treated together, and have results output in a single VCF, then you have to use CombineVariants to merge them first. However, if you're happy having them be processed as separate variants and have the outputs in separate VCFs, then you can pass in separate files.

    Geraldine Van der Auwera, PhD

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    No worries, your question is fine. We'll take pretty much anything that is related to GATK, and we're more than happy to clarify the Best Practices if it can help people use the tools correctly.

    To actually answer your question -- can you first tell me whether those variants derive from the same data (same sample) or from different ones?

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen United KingdomMember Posts: 404 ✭✭✭
    edited April 2013

    I should have clarified. The samples in each of the 4 VCFs are unrelated; i.e. they are derived from different BAMs originating from different populations.

    All 4 VCFs contain calls at the same positions, because I specified an interval list and used EMIT_ALL_SITES, when calling with UnifiedGenotyper. I called the 4 populations separately thinking that would be the best approach.

    I also checked the VariantRecalibrator.java source code briefly, but I couldn't quite find the answer to my question.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    Accepted Answer

    I see. Then it depends how you want to proceed with your analysis; if you want the various sample calls for the same sites to be treated together, and have results output in a single VCF, then you have to use CombineVariants to merge them first. However, if you're happy having them be processed as separate variants and have the outputs in separate VCFs, then you can pass in separate files.

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen United KingdomMember Posts: 404 ✭✭✭

    Thank you Geraldine. I don't want the identical positions processed as separate sites. Hence I am taking the route of using CombineVariants followed by VariantAnnotator. Thank you for confirming my choice/approach to be the right one in this case.

Sign In or Register to comment.