Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

No heterozygous variants after converting vcf file to fasta

Hello eveyone,
Now I'm struggling with a problem that when I converted a vcf file to fasta, I found there was not any heterozygous variants in the fasta file which was unexpected to see. Below was my flow of each step.
1. java -Xmx2g -jar GenomeAnalysisTK.jar -T SelectVariants -R Melc_scaffolds.fasta -V variant.vcf -o MD.vcf -sn MD
2. java -Xmx4g -jar GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R Melc_scaffolds.fasta -V MD.vcf -o MD.fasta
These two stpes was according to this question https://gatkforums.broadinstitute.org/gatk/discussion/8035/vcf-to-fasta
what I don't understand is which command should I use "-trimAlternates"?
Above all, could anyone can tell me what I am doing wrong and what is the right way to have a correct fasta file?
Caiyc

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Yacheng_Cai
    Hi Caiyc,

    FastaAlternateReferenceMaker will simply substitute the alternate allele from the sample into the output reference. I am not sure I understand what you are asking for when you say "I found there was not any heterozygous variants in the fasta file which was unexpected to see." Can you post some examples of what you are getting and would like to get?

    -Sheila

  • Hi, @Sheila
    For example, when you check the vcf files, you wiil find,
    Contig0 3576 . G A 10323.71 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=0.575;ClippingRankSum=0.00;DP=228;ExcessHet=0.0033;FS=1.751;InbreedingCoeff=0.6947;MQ=60.00;MQRankSum=0.00;QD=24.35;ReadPosRankSum=0.393;SOR=0.789;set=variant GT:AD:DP:GQ:PGT:PID:PL 0/1:102,126:228:99:0|1:3540_G_A:5101,0,4021
    It seems this site is heterozygosity, and in the fasta file, it should be R, which means A/G. However, when I got my fasta file, I found it was a G which is just as same as the reference.
    Besides, in the whole fasta file I got, no any site was heterozygous, like R, Y, W or any other characters means heterozygous genotype.
    Do I make it clearer to understand ?
    Caiyc

Sign In or Register to comment.