We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Extracting MQ and QUAL values for invariant sites in VCF files

elcorteganoelcortegano University of EdinburghMember
I'm having problems getting mapping quality (MQ) values and PHRED called site quality scores (QUAL) for invariant sites in the VCF files generated by GATK, even when I specify that all sites should be called.

First, I cannot se a MQ value for invariant sites. Is it possible to obtain this value for these sites?

The QUAL value is generated for some invariant sites, but for a majority of them an 'Infinity' value is obtained instead. After inspecting these sites in IGV,it is not clear to me whether this is related to high or low quality regions, as they appear both in sites with good and low coverage.

The options used with GATK are the following.

First, for each sample I'm working with, I'm using GATK HaplotypeCaller as follows:

GATK HaplotypeCaller \
-I file.bam \
-O file.g.vcf \
-R reference.fa.gz \
-ploidy 1 \
-ERC BP_RESOLUTION \
-stand-call-conf 10.0

After that, I combine the generated GVCF files and call variants as follows:

GATK CombineGVCFs \
-R reference.fa.gz \
-O combined.g.vcf \
--variant file1.g.vcf \
--variant file2.g.vcf \
...

GATK GenotypeGVCFs \
-R reference.fa.gz \
-V combined.g.vcf \
-O file.vcf \
-ploidy 1 \
-all-sites

Is there a wat to get numeric MQ and QUAL values for all invariant sites?

Thank you
Tagged:

Answers

Sign In or Register to comment.