can VariantsToTable output the raw genotype call (i.e., 0/1) rather than the actual basecall (A/T)?

I'm interested in getting simple "heterozygous" or "homozygous" designations for all of the samples/SNPs in my multisample VCF file. In the past, I have been using the -GF GT option in VariantsToTable, and then annotating my basecalls in Excel as either heterozygous or homozygous. This takes forever since Excel isn't really built for big data like this. Is there a simple way to output all of the SNPs as 0/1, 0/0, 0/1, or 1/1 instead of C/A, A/A, G/T, C/C?

    Thanks, Not sure why I missed that thread. I also found a workaround.

    By running

    grep -v ^## input.vcf | cut -f1,2,10- | sed 's/:\S*//g' > output.txt

    It reported exactly what I needed. However, I'm not sure how much of a grain of salt I should take this result with given the Broad's insistence on using a dedicated tool to parse a VCF file as opposed to doing it manually. It seems to have worked as expected for the time being.


    @mmats010 The main reason why we push VariantsToTable as opposed to manual parsing is that we've seen so many people lose time and effort due to unexpected/silent failures using manual parsing solutions - particularly when they're trying to do something a little complicated that is vulnerable to weird VCF idiosyncrasies. In your case you're doing something that's simple enough that I wouldn't worry about it.

  • Is there still no tool/method to address this question? My genotype fields are very complex so manual parsing with sed is not a safe thing to do.

    No, the behavior has not changed in GATK4. I don't think there are any plans to change it any time in the near future.


