Testdrive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a userfriendly interface!) without having to install anything.
Problems generating phased haplotypes with HaplotypeCaller
Hello
I am trying to use the HaplotypeCaller to phase alleles for nuclear genes in a diploid plant species. I have reads from one species mapped to the reference genome from another closely related species. What I want to end up with are contiguous sequences for separate alleles that I can use for phylogenetic analysis.
I have generated a gvcf file using the emitRefConfidence BP_RESOLUTION option. In the output, some of the bases appear phased, but I don't really understand some of the results. For example, for some positions the phasing seems to refer to the same base. Here is an example:
scaffold11251 3338 . A G, 1549.77 . DP=39;MLEAC=2,0;MLEAF=1.00,0.00;MQ=42.18 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,39,0:39:99:01:3338_A_G:1578,117,0,1578,117,1578:0,0,16,23
Also, in this case, the genotype is 1/1 so I don't understand why this position appears to be phased.
I have also looked at the bam output format. I have tried setting options so that I only get a maximum of 2 haplotypes (I read this post http://gatkforums.broadinstitute.org/discussion/4527/caniretrievethehaplotypesassembledfromhaplotypecaller) using this option:
maxNumHaplotypesInPopulation 2
but for most active regions I still get more than 2 haplotypes (most often 3 or 4, but sometimes 5).
I've read through the HaplotypeCaller documentation + the description of the gvcf format but I still don't fully understand how to interpret the results or successfully generate the output I need, i.e. contiguous sequences of separate phased alleles.
I would be very grateful of any advice!
Laura
Best Answer

Geraldine_VdAuwera Cambridge, MA admin
Hi Laura,
I would like to extract contiguous sequences, rather than just the variants. I thought that this might be possible by physically phasing variants with HaplotypeCaller. Are you saying that it is not possible to do this?
Yes and no. This is not part of the primary purpose of the tool, so there is no straightforward way to make it output just the haplotype sequences, if that's what you want. But you could use the artificial haplotypes that are written to the bamout file, with some additional manipulation. You would have to identify and extract from the bam file the haplotypes that were chosen as the most likely and used for generating the output VCF.
Also, I thought that is would be possible to select the most likely haplotypes for each active region from the bam output. However, as I mentioned in my original post, when I try to set the options to only select the two most likely haplotypes for each active region I still end up with additional possible haplotypes for each active region (e.g. 5 or more for some regions).
I'm not sure why you're getting more than 2 haplotypes output when you use those settings; I think it's because this makes the tool only use 2 for the final calculations, but it still considers other possibilities in the steps leading up to that internally.
These outputs were originally developed only for debugging purposes, which is why they're so awkward to use for producing haplotypes as an end product of the analysis. If lots of people ask for this capability we may be able to add that as a feature request to the development roadmap, but so far it seems like it's more of a minority use case, so it's hard to justify putting a lot of effort into it.
Answers
@Laura_Kelly
Hi Laura,
Can you please post the surrounding variant records? Specifically, the PID field is the phase identifier that all the same phased variants will have. If you can post all the variant records with the same PID, it would be really helpful. Can you also post the bamout file at those sites?
Thanks,
Sheila
Hi Shelia
Thanks for your reply. I've uploaded the vcf and bam files  is this what you need?
Laura
Hi Sheila
I have tried a couple of times to add a comment with the vcf and bam info in the body of the text in case there is a problem with the files I uploaded yesterday, but my comment is not showing up. The bam info was a link to an image file I uploaded, so maybe that is causing a problem?
Please let me know if need me to provide more info!
Many thanks
Laura
Hi Sheila
I am still having problems with physical phasing of haplotypes in HaplotypeCaller. It seems that most of the sites that are being phased in the genes I am analysing are 1/1 variants. Here is another example, these are all the lines for PID '3929_C_T':
scaffold11251 3923 . C . . . GT:AD:DP:GQ:PL 0/0:29,0:29:75:0,75,1125
scaffold11251 3924 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:75:0,75,1125
scaffold11251 3925 . C . . . GT:AD:DP:GQ:PL 0/0:29,0:29:75:0,75,1125
scaffold11251 3926 . A . . . GT:AD:DP:GQ:PL 0/0:29,0:29:69:0,69,1035
scaffold11251 3927 . G . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3928 . A . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3929 . C T, 951.77 . DP=26;MLEAC=2,0;MLEAF=1.00,0.00;MQ=43.13 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,26,0:26:78:01:3929_C_T:980,78,0,980,78,980:0,0,12,14
scaffold11251 3930 . A . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3931 . T . . . GT:AD:DP:GQ:PL 0/0:26,0:26:69:0,69,1035
scaffold11251 3932 . T . . . GT:AD:DP:GQ:PL 0/0:27,0:27:72:0,72,1080
scaffold11251 3933 . G . . . GT:AD:DP:GQ:PL 0/0:27,0:27:72:0,72,1080
scaffold11251 3934 . A . . . GT:AD:DP:GQ:PL 0/0:26,0:26:72:0,72,1080
scaffold11251 3935 . A . . . GT:AD:DP:GQ:PL 0/0:26,0:26:72:0,72,1080
scaffold11251 3936 . T . . . GT:AD:DP:GQ:PL 0/0:27,0:27:75:0,75,1125
scaffold11251 3937 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3938 . C . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3939 . A . . . GT:AD:DP:GQ:PL 0/0:29,0:29:81:0,81,1215
scaffold11251 3940 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:81:0,81,1215
scaffold11251 3941 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:81:0,81,1215
scaffold11251 3942 . T . . . GT:AD:DP:GQ:PL 0/0:25,1:26:60:0,60,939
scaffold11251 3943 . T . . . GT:AD:DP:GQ:PL 0/0:27,0:27:78:0,78,1170
scaffold11251 3944 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:81:0,81,1215
scaffold11251 3945 . G . . . GT:AD:DP:GQ:PL 0/0:27,1:28:42:0,42,987
scaffold11251 3946 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3947 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3948 . G . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3949 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:81:0,81,1215
scaffold11251 3950 . A . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3951 . G . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3952 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3953 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3954 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:72:0,72,1080
scaffold11251 3955 . T . . . GT:AD:DP:GQ:PL 0/0:28,0:28:72:0,72,1080
scaffold11251 3956 . C . . . GT:AD:DP:GQ:PL 0/0:27,0:27:72:0,72,1080
scaffold11251 3957 . A . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3958 . T . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3959 . G . . . GT:AD:DP:GQ:PL 0/0:27,0:27:69:0,69,1035
scaffold11251 3960 . A . . . GT:AD:DP:GQ:PL 0/0:26,0:26:69:0,69,1035
scaffold11251 3961 . A . . . GT:AD:DP:GQ:PL 0/0:27,0:27:72:0,72,1080
scaffold11251 3962 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:75:0,75,1125
scaffold11251 3963 . G . . . GT:AD:DP:GQ:PL 0/0:28,0:28:75:0,75,1125
scaffold11251 3964 . G . . . GT:AD:DP:GQ:PL 0/0:27,0:27:75:0,75,1125
scaffold11251 3965 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:75:0,75,1125
scaffold11251 3966 . G . . . GT:AD:DP:GQ:PL 0/0:28,0:28:75:0,75,1125
scaffold11251 3967 . A . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3968 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3969 . G . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3970 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3971 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3972 . A . . . GT:AD:DP:GQ:PL 0/0:28,0:28:78:0,78,1170
scaffold11251 3973 . T . . . GT:AD:DP:GQ:PL 0/0:28,1:29:69:0,70,1040
scaffold11251 3974 . A . . . GT:AD:DP:GQ:PL 0/0:28,1:29:69:0,70,1030
scaffold11251 3975 . C . . . GT:AD:DP:GQ:PL 0/0:29,0:29:87:0,87,1137
scaffold11251 3976 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:86:0,87,1107
scaffold11251 3977 . C . . . GT:AD:DP:GQ:PL 0/0:29,0:29:86:0,87,1076
scaffold11251 3978 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:84:0,84,1260
scaffold11251 3979 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:84:0,84,1260
scaffold11251 3980 . C . . . GT:AD:DP:GQ:PL 0/0:29,0:29:84:0,84,1260
scaffold11251 3981 . T . . . GT:AD:DP:GQ:PL 0/0:28,1:29:74:0,75,1031
scaffold11251 3982 . T . . . GT:AD:DP:GQ:PL 0/0:29,0:29:81:0,81,1215
scaffold11251 3983 . T . . . GT:AD:DP:GQ:PL 0/0:28,1:29:74:0,75,1096
scaffold11251 3984 . A . . . GT:AD:DP:GQ:PL 0/0:26,1:27:68:0,69,974
scaffold11251 3985 . G . . . GT:AD:DP:GQ:PL 0/0:29,0:29:81:0,81,1215
scaffold11251 3986 . G A, 470.77 . BaseQRankSum=0.268;ClippingRankSum=0.708;DP=27;MLEAC=1,0;MLEAF=0.500,0.00;MQ=43.37;MQRankSum=1.195;ReadPosRankSum=0.024 GT:AD:DP:GQ:PGT:PID:PL:SB 0/1:12,15,0:27:99:01:3929_C_T:499,0,367,535,412,947:6,6,11,4
scaffold11251 3987 . G . . . GT:AD:DP:GQ:PL 0/0:30,0:30:81:0,81,1215
scaffold11251 3988 . T . . . GT:AD:DP:GQ:PL 0/0:31,0:31:78:0,78,1170
scaffold11251 3989 . A . . . GT:AD:DP:GQ:PL 0/0:30,0:30:78:0,78,1170
scaffold11251 3990 . T . . . GT:AD:DP:GQ:PL 0/0:29,1:30:72:0,73,1117
scaffold11251 3991 . G . . . GT:AD:DP:GQ:PL 0/0:30,0:30:78:0,78,1170
scaffold11251 3992 . A . . . GT:AD:DP:GQ:PL 0/0:29,0:29:78:0,78,1170
scaffold11251 3993 . T . . . GT:AD:DP:GQ:PL 0/0:30,0:30:81:0,81,1215
scaffold11251 3994 . T . . . GT:AD:DP:GQ:PL 0/0:32,0:32:87:0,87,1305
scaffold11251 3995 . T . . . GT:AD:DP:GQ:PL 0/0:32,0:32:87:0,87,1305
There are only two variants for this haplotype block and only one of them is heterozygous.
I would be very grateful for any further advice you could give me on using HaplotypeCaller for physical phasing.
Many thanks
Laura
Hi Laura, the phasing that HC emits in the GVCF is not meant to be interpreted directly, it's only an intermediate notation that will be finalized appropriately when you run GenotypeGVCFs.
Hi Geraldine
Thanks for your reply. I actually only have one sample that I have mapped to my reference, so I didn't plan to run the GenotypeGVCFs tool. What I am trying to do is map reads from one species to the reference of another and then extract phased sequences for separate alleles. I would like to extract contiguous sequences, rather than just the variants. I thought that this might be possible by physically phasing variants with HaplotypeCaller. Are you saying that it is not possible to do this?
Also, I thought that is would be possible to select the most likely haplotypes for each active region from the bam output. However, as I mentioned in my original post, when I try to set the options to only select the two most likely haplotypes for each active region I still end up with additional possible haplotypes for each active region (e.g. 5 or more for some regions).
Many thanks
Laura
@Laura_Kelly
Hi Laura,
The intermediate BP_RESOLUTION GVCF is not meant to be used as the final analysis file. Even if you have one sample, you need to run GenotypeGVCFs to get the final VCF. Once you get the final VCF, can you please post the records for positions 3986 and 3929? Please also post IGV screenshots of the region from the original bam file and bamout file. I cannot view the bam file you posted because I do not have the reference.
Thanks,
Sheila
Dear Sheila and Geraldine
Thanks for your help. I have run the GentoypeGVCFs tool and here are the records for positions 3929 & 3986 (+ surrounding lines):
scaffold11251 3922 . C . . . AN=2;DP=31 GT:AD:DP:RGQ 0/0:31:31:78
scaffold11251 3923 . C . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:75
scaffold11251 3924 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:75
scaffold11251 3925 . C . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:75
scaffold11251 3926 . A . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:69
scaffold11251 3927 . G . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3928 . A . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3929 . C T 951.77 . AC=2;AF=1.00;AN=2;DP=26;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=43.13;QD=29.09;SOR=0.846 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,26:26:78:11:3929_C_T:980,78,0
scaffold11251 3930 . A . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3931 . T . . . AN=2;DP=26 GT:AD:DP:RGQ 0/0:26:26:69
scaffold11251 3932 . T . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:72
scaffold11251 3933 . G . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:72
scaffold11251 3934 . A . . . AN=2;DP=26 GT:AD:DP:RGQ 0/0:26:26:72
scaffold11251 3935 . A . . . AN=2;DP=26 GT:AD:DP:RGQ 0/0:26:26:72
scaffold11251 3936 . T . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:75
scaffold11251 3937 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3938 . C . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3939 . A . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:81
scaffold11251 3940 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:81
scaffold11251 3941 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:81
scaffold11251 3942 . T . . . AN=2;DP=26 GT:AD:DP:RGQ 0/0:25:26:60
scaffold11251 3943 . T . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:78
scaffold11251 3944 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:81
scaffold11251 3945 . G . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:27:28:42
scaffold11251 3946 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3947 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3948 . G . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3949 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:81
scaffold11251 3950 . A . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:78
scaffold11251 3951 . G . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:78
scaffold11251 3952 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3953 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3954 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:72
scaffold11251 3955 . T . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:72
scaffold11251 3956 . C . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:72
scaffold11251 3957 . A . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3958 . T . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3959 . G . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:69
scaffold11251 3960 . A . . . AN=2;DP=26 GT:AD:DP:RGQ 0/0:26:26:69
scaffold11251 3961 . A . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:72
scaffold11251 3962 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:75
scaffold11251 3963 . G . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:75
scaffold11251 3964 . G . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:27:27:75
scaffold11251 3965 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:75
scaffold11251 3966 . G . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:75
scaffold11251 3967 . A . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:78
scaffold11251 3968 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:78
scaffold11251 3969 . G . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:78
scaffold11251 3970 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3971 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3972 . A . . . AN=2;DP=28 GT:AD:DP:RGQ 0/0:28:28:78
scaffold11251 3973 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:28:29:69
scaffold11251 3974 . A . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:28:29:69
scaffold11251 3975 . C . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:87
scaffold11251 3976 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:86
scaffold11251 3977 . C . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:86
scaffold11251 3978 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:84
scaffold11251 3979 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:84
scaffold11251 3980 . C . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:84
scaffold11251 3981 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:28:29:74
scaffold11251 3982 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:81
scaffold11251 3983 . T . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:28:29:74
scaffold11251 3984 . A . . . AN=2;DP=27 GT:AD:DP:RGQ 0/0:26:27:68
scaffold11251 3985 . G . . . AN=2;DP=29 GT:AD:DP:RGQ 0/0:29:29:81
scaffold11251 3986 . G A 470.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.268;ClippingRankSum=7.080e01;DP=27;FS=5.907;MLEAC=1;MLEAF=0.500;MQ=43.37;MQRankSum=1.20;QD=17.44;ReadPosRankSum=0.024;SOR=1.911 GT:AD:DP:GQ:PGT:PID:PL 0/1:12,15:27:99:01:3929_C_T:499,0,367
scaffold11251 3987 . G . . . AN=2;DP=30 GT:AD:DP:RGQ 0/0:30:30:81
scaffold11251 3988 . T . . . AN=2;DP=31 GT:AD:DP:RGQ 0/0:31:31:78
scaffold11251 3989 . A . . . AN=2;DP=30 GT:AD:DP:RGQ 0/0:30:30:78
scaffold11251 3990 . T . . . AN=2;DP=30 GT:AD:DP:RGQ 0/0:29:30:72
scaffold11251 3991 . G . . . AN=2;DP=30 GT:AD:DP:RGQ 0/0:30:30:78
I had a problem getting IGV to run, but I have uploaded screen shots from Tablet for the variants in the original input bam file and the bamout file with the selected haplotypes from HC. I used the options "bamWriterType CALLED_HAPLOTYPES" and "maxNumHaplotypesInPopulation 2" when generating the bam output file.
Best wishes
Laura
Issue · Github
by Sheila
Hi Laura,
Yes and no. This is not part of the primary purpose of the tool, so there is no straightforward way to make it output just the haplotype sequences, if that's what you want. But you could use the artificial haplotypes that are written to the bamout file, with some additional manipulation. You would have to identify and extract from the bam file the haplotypes that were chosen as the most likely and used for generating the output VCF.
I'm not sure why you're getting more than 2 haplotypes output when you use those settings; I think it's because this makes the tool only use 2 for the final calculations, but it still considers other possibilities in the steps leading up to that internally.
These outputs were originally developed only for debugging purposes, which is why they're so awkward to use for producing haplotypes as an end product of the analysis. If lots of people ask for this capability we may be able to add that as a feature request to the development roadmap, but so far it seems like it's more of a minority use case, so it's hard to justify putting a lot of effort into it.
Hi @Geraldine_VdAuwera and @Sheila,
Can I belatedly follow up on this and ask why @Laura_Kelly was getting phasing for her homozygous nonreference site, please? I've been getting similar results. I also have a number of sites which, overall, across 200 samples, are called as SNPs but which, for individual samples, are recorded as 0/0 or ./. yet those samples' phasing information still record the site as 01. I hope that makes sense!
Many thanks,
Ian
@ianw
Hi Ian,
Can you please post an example? I'm not sure I understand your question.
Thanks,
Sheila
Hi @Sheila
Having looked again, I think I've solved the problem, so don't worry about it Thanks for getting back to me though!
Cheers,
Ian