(Possibly) Problem in tabulation in vcf files

MoneteMonete BrazilMember
edited February 1 in Ask the GATK team
Hi all,

I need some help with this (possibly) problem which appeared in my data.

I ran genotype refinement pipeline (without .ped file) after all GATK best practices. The steps were: calculategenotypeposterior --> VariantFiltration and put "lowGQ" tag on GQ < 20 on each sample --> VariantAnnotator. All commands in genotype refinement steps are below:

**CalculateGenotypePosteriors**

```
java -jar $gatkPath -T CalculateGenotypePosteriors \
-R $refs/ucsc.hg19.fasta --variant ${inputName}_recalibrated.filtered.vcf \
--supporting $refs/1000G_phase3_v4_20130502.sites.lifted_over_fromb37_to_hg19.vcf.gz \
--out ${inputName}_posteriors.vcf.gz
```

**VariantFiltration**

```
java -jar $gatkPath -T VariantFiltration \
-R $refs/ucsc.hg19.fasta --variant ${inputName}_posteriors.vcf.gz \
--genotypeFilterExpression "GQ < 20.0" --genotypeFilterName lowGQ \
--out ${inputName}_postCGP.Gfiltered.vcf.gz
```

**VariantAnnotator**

```
java -jar $gatkPath -T VariantAnnotator \
-R $refs/ucsc.hg19.fasta --variant ${inputName}_postCGP.Gfiltered.vcf.gz \
--group StandardAnnotation -A BaseCounts -A GCContent -A GenotypeSummaries -A LowMQ -A MappingQualityZero \
-A NBaseCount -A SampleList -A VariantType -A AlleleCountBySample -A MappingQualityZeroBySample \
--out ${inputName}_postCGP.Gfiltered.ANNOTATED.vcf.gz
```

After this, I wanted to compare the information added in each step of genotype refinement pipeline in my vcf files. So I compared these 2 vcf files: (1) generated from calculategenotypeposterior and (2) generated from VariantAnnotator.

I realized something (possibly) wrong in tabulation on FORMAT field (image below). In file (1) the fourth column correspond to FT annotation. But on each sample result the information in fourth column is about the filtration tag for GQ (on 5th column from FORMAT field).
But in file (2), apparently, the GQ information is correctly positioned.

So, did I miss something? Did I observe in right place?

P.S.: I believe (and I don't know why) markdown formatting it's not working for me. Sorry!

Thanks for your help.

Best,

Monete

Best Answer

  • bhanuGandhambhanuGandham admin
    Accepted Answer

    HI @Monete

    In file (1) there is FT information on the 4th column and GQ information in the 5th column. In file (2) there is GQ information in the 4 column.

    There is no requirement that different vcf have the same number or ordering of the info field.

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    Accepted Answer

    HI @Monete

    In file (1) there is FT information on the 4th column and GQ information in the 5th column. In file (2) there is GQ information in the 4 column.

    There is no requirement that different vcf have the same number or ordering of the info field.

  • MoneteMonete BrazilMember
    edited February 7
    Hi @bhanuGandham

    thanks for your reply

    Now I unterstood.

    I thought "lowGQ/PASS" tags were from "GQ" information updated.

    Thank you very much for your help and patience.
Sign In or Register to comment.