Identifying de novo variants in a mutant without good reference genome

TeatimeTeatime Member
edited December 2018 in Zoo & Garden
Hi all tea party experts, :D

I have a non-human haploid organism mutant with a conspicuous phenotype. I sequenced parental and mutant genomes but available reference genome is from another strain.

Q1. Since I don't have know variants data, should I skip the BQSR? Or do I need to bootstrap BQSR? I don't even look for SNPs in this species. All I want to know is the difference between parental strain and spontaneous mutant which should be very few if not one mutation.

Q2. Since the reference genome is from another strain, there are ~30,000 SNPs between our parental strain and the available reference genome. Should I make a new consensus reference genome from our parental strain first (FastaAlternateReferenceMaker with reference genome and vcf file generated after HaplotypeCaller) and map the mutant genome reads to this new parental reference genome? Or just stick with the original reference genome from another strain and map all the parental and mutant reads to it?

Any help would be really appreciated.


Best Answer


  • TeatimeTeatime Member
    Since no one is in this rose garden, I will report my update. I went ahead and did GVCF mode Haplotypecaller -> CombineGVCFs -> GenotypeGVCFs -> VariantFiltration without BQSR/VQSR. I used the reference genome from another strain. Now I have to find the mutation that is responsible for the phenotype. What tool should I use to simply identify variants only present in mutants but not in the parental strains?
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @Teatime As far as Mutect2 is concerned I think mapping the parental and mutant genomes to the original reference will work. Any SNPs that arise from reference mismatch will get filtered as "germline" events.

  • TeatimeTeatime Member
    Thank you for the input @shlee and @davidben .

    I was hesitant to use Mutect2 because it was developed for diploid genome with complex mixture of mutants and WT? But yes, I should definitely try it.

    The joint genotype calling was very effective. After importing the resulting vcf into R, I was able to simply select the genotyped variants based on the presence in mutants and absence in parental genome. This reduced the candidate from 30,000 to 26 variants. The first one on the list based on QUAL score was located on an exon of a gene. All other variants were either on intergenic or in introns. Thank you all for develping this useful tools.

    Have a wonderful tea party at the Zoo and the Rose Garden.

Sign In or Register to comment.