Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VCF contigs don't match reference genome

I am working with fungi RNA-seq SNPs that I called using the GATK best practices pipeline. I have 12 vcf files of SNPs that I called using REFERENCE1.fa. I also have a vcf file from a collaborator with SNPs that were called using the same reference genome (REFERENCE1.fa). My goal was to combine all of my vcf files with those of my collaborator for phylogenetic tree data analysis. I was able to combine my 12 vcf files, but when I try to combine with my collaborator's file, I get an error saying that the contigs don't match. I looked more closely at both of our vcf files, and I noticed that there are two contigs present in my file that aren't present in their file. I found out that these contigs are for unmapped scaffolds, and mitochondria. I thought maybe things would work if I removed these two contigs from my vcf file. I used selectvariants to do this, and tried the merging again. Now I am getting an error saying that my vcf contigs do not match my reference genome. Is there a way to remove these contents from my reference genome as well, so that I won't get this error?

Best Answer


  • lfalllfall Member

    I just wanted to add one thing, I have also found the individual genome chromosomes (REFERENCE_chr1.fa, REFERENCE_chr2.fa......) available for download. If I can't remove the contigs I don't want from the reference, would it be possible to use these individual files? Ideally, I would like to not have to start from the very beginning of the pipeline, but I'm not sure if that's possible. Thank you!!!

  • lfalllfall Member

    I realized I made a mistake in what I said before. The contigs for both VCFs were the same in the header, but the collaborator file only had alignments to 4 of the contigs, while my file had alignments to all six. My file only had about 15 SNPs total on the second two contigs, but I guess this was enough to mess things up? I tried moving to vcftools, and for some reason I am able to merge the files with vcftools but not with GATK. I'm worried that something is wrong with my files but vcftools just doesn't pick up the error. Is it normal for there to be problems merging if one vcf file doesn't have SNPs on all of the possible contigs?

  • SheilaSheila Broad InstituteMember, Broadie admin


    Is it normal for there to be problems merging if one vcf file doesn't have SNPs on all of the possible contigs?

    No, this should not happen. Can you try validating your VCFs with ValidateVariants?


Sign In or Register to comment.