I have a VCF file that is missing the GT field. Can I just add 0/1 for each variant, and let GATK's ReadBackedPhasing take care of resolving the actual phased genotypes?
I don't think so. Why does your VCF not have genotypes? Where/how did you produce it?
The VCF was produced from a BED-like file of variants that I had.. The file had these five columns - chr,start, stop,ref, and alt.
It will take some time and effort to go back to the original output by the variant callers, so if you know of any other way that GATK can resolve this issue , let me know.
I have no idea if your way will work, and we have not tested it. But, you can try it out and let us know how it goes If it does not work, you will need to generate a proper VCF (preferably by GATK tools) to input to ReadBackedPhasing.
I ran this:
java -Xmx50g -jar .../GenomeAnalysisTK-3.3.0/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -L regions.bed -T HaplotypeCaller -nct 8 -I recal.bam -o g.vcf --genotyping_mode DISCOVERY -stand_emit_conf 30 -stand_call_conf 30 -ERC BP_RESOLUTION -variant_index_type LINEAR -variant_index_parameter 128000 >& hp.log
using -ERC GVCF gave the same thing.
All lines look like this:
chr9 133730278 . A . . . GT:AD:DP:GQ:PL 0/0:374,6:380:99:0,1 20,1800
Why I didn't see 0|1 or 1|1? Thanks!
chr9 133730274 . A . . . GT:AD:DP:GQ:PL 0/0:349,10:359:99:0,120,1800
The tool will only output | in the genotype when the site is phased with another site. Can you confirm the sites you posted are in phase with other sites? Please post some IGV screenshots of the sites that are in phase. Also, I hope this blog will help.