VariantAnnotator after CombineVariants

I am doing variant calling of multiple RNAseq datasets using GATK/3.4.46. For limitation of computational resources, I ran HaplotypeCaller on each dataset separately. Then I ran CombineVaraints to merge all output VCF files using this command

java -Xmx10g -jar GenomeAnalysisTK.jar \
-T CombineVariants \
-R $gatk_ref \
--variant set1.vcf \
--variant set2.vcf \
--variant set3.vcf \
-o combine_output.vcf \
-genotypeMergeOptions UNIQUIFY

Then I tried to run VariantFiltration using thic command

java -Xmx2g -jar $GATK/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R $gatk_ref \
-V combine_output.vcf \
-window 35 -cluster 3 \
-filterName FS -filter "FS > 30.0" \
-filterName QD -filter "QD < 2.0" \
-o $output

Several thousands variants though warning for absence of FS and QD.
According to @Sheila advise in http://gatkforums.broadinstitute.org/discussion/2334/undefined-variable-variantfiltration, I ran VariantAnnotator to add these annotations using this command

java -Xmx45g -jar GenomeAnalysisTK.jar \
-R $gatk_ref \
-T VariantAnnotator \
-I input1.bam \
-I input2.bam \
.
.
-I input57.bam \
-V combine_output.vcf \
-A Coverage \
-A FisherStrand \
-A QualByDepth \
-nt 7 \
-o combine_output_ann.vcf

Then I repeated the VariantFiltration but I have 2 problems:
1) about 2000 variants are still not annotated for FS. All of them are indels and many of them are not homozygous for the ALT allele). Also ~ 40 variants are still not annotated for QD. All of them have multiple ALT alleles
2) The combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. According to my manual calculations, I think VariantAnnotator calculates the QD value by dividing this QUAL value by the AD of samples with a non hom-ref genotype call. This cause many variants to fail the QD filter.

Thank you

Best Answer

Answers

Sign In or Register to comment.