The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

In regards to Intersecting vcf files

sagipolanisagipolani Member
edited March 2013 in Ask the GATK team

Hi all,

I would appreciate your thoughts on the following pipeline:
I'm currently working on a number of WGS of non-human vertebrates. My approach for calling variants is to maximize the sensitivity of the calls by using two callers (GATK's UnifiedGenotyper + samtools' mpileup) per chromosome regardless of / ingnoring all filters. Next, I would like to merge (not intersect) the two vcf files (GATK+samtools) per each chromosome, then merge (not intersect) all the vcf files pertaining to all chromosomes in order to retrieve a final vcf dataset per individual:

For merging the GATK and samtools:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:GATK chr#.GATK.vcf --variant:samtools chr#.samtools.vcf 
-o chr#.GATK_samtools.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority GATK,samtools --filteredrecordsmergetype KEEP_UNCONDITIONAL

For merging all chromosomes per individual:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:chr1 chr1.GATK_samtools.union.vcf --variant:chr2 chr2.GATK_samtools.union.vcf --variant:chr3 chr3.GATK_samtools.union.vcf 
-o Individual1.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority chr1,chr2,chr3 --filteredrecordsmergetype KEEP_UNCONDITIONAL

Finally I would like to intersect between two individuals and keep only the variants that are common to both individuals:

Uniting / merging two individuals:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:individual1 Individual1.union.vcf --variant:Individual2 Individual2.union.vcf -o Individual1_2.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority Indiviual1,Individual2 --filteredrecordsmergetype KEEP_UNCONDITIONAL

Intersecting the two indiviuals in order to keep only common variants:

$  java -Xmx10g -jar GenomeAnalysisTK.jar -T SelectVariants -R ref.fasta 
--variant Individual1_2.union.vcf -select 'set == "Intersection";' 
-o Intersected.vcf

Am I doing this right? I'm afraid I may be losing variants or something else along this pipeline. Remember that I want to keep only the common variants while ignoring the filters in order to increase sensitivity as much as possible.

Thanks!

Sagi

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answer

Answers

  • bmartinezbmartinez EEUUMember

    Hi everyone,

    I would like to merge two different vcf files, one is a gvcf file generated with GATK Haplotype caller and one vcf generated with FREEBAYES. My first question is whether I can merge a gvcf with a vcf. My second question is if the commands that sagipolani used for merging GATK and Samtools files are also ok for merging the GATK gvcf file and the FREEBAYES vcf file.
    My genomes are in scaffolds, not in chromosomes.

    Thanks to all in advance!

    Begoña Martinez

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @bmartinez
    Hi Begoña Martinez,

    You really should not be merging a GVCF and a VCF. A GVCF is an intermediate file that is not to be used in final analysis. You should first run GenotypeGVCFs to obtain the final VCF, then you can use CombineVariants to merge the two VCFs.

    -Sheila

  • bmartinezbmartinez EEUUMember

    Hi Sheila,

    Thanks a lot for your answer.

    Best,

    Begoña

Sign In or Register to comment.