VCF contigs don't match reference genome

Hello,
I am working with fungi RNA-seq SNPs that I called using the GATK best practices pipeline. I have 12 vcf files of SNPs that I called using REFERENCE1.fa. I also have a vcf file from a collaborator with SNPs that were called using the same reference genome (REFERENCE1.fa). My goal was to combine all of my vcf files with those of my collaborator for phylogenetic tree data analysis. I was able to combine my 12 vcf files, but when I try to combine with my collaborator's file, I get an error saying that the contigs don't match. I looked more closely at both of our vcf files, and I noticed that there are two contigs present in my file that aren't present in their file. I found out that these contigs are for unmapped scaffolds, and mitochondria. I thought maybe things would work if I removed these two contigs from my vcf file. I used selectvariants to do this, and tried the merging again. Now I am getting an error saying that my vcf contigs do not match my reference genome. Is there a way to remove these contents from my reference genome as well, so that I won't get this error?

Best Answer

Answers

  • lfalllfall Member

    I just wanted to add one thing, I have also found the individual genome chromosomes (REFERENCE_chr1.fa, REFERENCE_chr2.fa......) available for download. If I can't remove the contigs I don't want from the reference, would it be possible to use these individual files? Ideally, I would like to not have to start from the very beginning of the pipeline, but I'm not sure if that's possible. Thank you!!!

  • lfalllfall Member

    Hello,
    I realized I made a mistake in what I said before. The contigs for both VCFs were the same in the header, but the collaborator file only had alignments to 4 of the contigs, while my file had alignments to all six. My file only had about 15 SNPs total on the second two contigs, but I guess this was enough to mess things up? I tried moving to vcftools, and for some reason I am able to merge the files with vcftools but not with GATK. I'm worried that something is wrong with my files but vcftools just doesn't pick up the error. Is it normal for there to be problems merging if one vcf file doesn't have SNPs on all of the possible contigs?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lfall
    Hi,

    Is it normal for there to be problems merging if one vcf file doesn't have SNPs on all of the possible contigs?

    No, this should not happen. Can you try validating your VCFs with ValidateVariants?

    Thanks,
    Sheila

Sign In or Register to comment.