To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Why does GATK LeftAlignAndTrimVariants set a missing genotype to 0/0?

TottiTotti 日本Member
edited October 2016 in Ask the GATK team

Hi. I appreciate many your helps.

I have one vcf file (a.vcf). This file has one variant data. The data also has missing genotypes "./." because of DP=0. The variant is tri-allelic variant as below.

"a.vcf"

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample131 sample138 sample908
chr12 104350956 . G T,A 147880 PASS . GT:AD:DP:GQ ./.:0,0:0 0/1:25,22,0:47:99 0/0:36,0,0:36:99 ./.:0,0:0

I want to split the tri-allelic data into bi-allelic data, so I did the below command using GATK.

java -jar GenomeAnalysisTK.jar \
-T LeftAlignAndTrimVariants \
-R ${ref_path} \
--variant a.vcf \
-o b.vcf \
--splitMultiallelics \
--reference_window_stop 900

As a result, I got b.vcf. "b.vcf"

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample131 sample138 sample908
chr12 104350956 . G T 147880 PASS AC=1;AF=0.250;AN=4 GT:AD:DP:GQ 0/0:.:0 0/1:25,22:47:99 0/0:36,0:36:99 0/0:.:0
chr12 104350956 . G A 147880 PASS AC=0;AF=0.00;AN=4 GT:AD:DP:GQ 0/0:.:0 0/0:25,0:47:99 0/0:36,0:36:99 0/0:.:0

In the b.vcf, thw splited two variants were bi-allelic data, but the missing genotypes were set to "0/0". I want to remain the missing genotype after the process of GATK.

How should I process the file?

GATK's version is 3.6.

Answers

Sign In or Register to comment.