Generating neutral genome for two breeds using FastaAlternateReferenceMaker

THKTHK Davis, CAMember
edited March 2016 in Ask the GATK team

Hello,

I am trying to generate a neutral genome between two chicken breeds to reduce the alignment bias when I count the SNP count ratio between breeds.

I want to replace every SNP on the reference genome to N or IUPAC ambiguity codes if the SNP is different between two breeds.
(for example, use R if Breed1 has A and Breed2 has G on certain position).

I tried two different ways but both seems not working.
First, Use both bam files from each breed as input for HaplotypeCaller (or merged bam files then HaplotypeCaller), filtered SNPs and run FastaAlternateReferenceMaker giving --use_IUPAC_sample argument
-> --use_IUPAC_sample takes only one of the samples (Breed1 or Breed2)

java -jar /software/gatk/3.5/static/GenomeAnalysisTK.jar \
-T FastaAlternateReferenceMaker \
-R /share/zhoulab/Referencegenome/Wholegenomefasta/genome.fa \
--use_IUPAC_sample breed1 \
-o breed1_iupac.fa \
-V filtered.snp.parent.vcf

Second, Run HaplotypeCaller for each bamfile seprately, filtered SNPs and run FastaAlternateReferenceMaker two rounds (both with --use_IUPAC_sample argument) which first converts reference.fa with Breed1.vcf then add use output.fa to add Breed2 vcf info.
-> gave error for "Input files variant and reference have incompatible contigs." so wouldn't let me run the second round

So is there any way to do it? or is this even possible?

Thank you in advance!!

Issue · Github
by Sheila

Issue Number
717
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    The problem here is that once you've created a new reference, all the files you had previously derived from the original reference are incompatible with the new reference.

    You need to first do the comparison between your callsets (to identify what variants are specific to each breed vs what is in common) then use the result of that to make a new reference. Then you'll have to realign all the data and re-call variants from scratch against that reference.

Sign In or Register to comment.