To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

ambiguous explanation of -invMv for SelectVariants

timflutretimflutre Montpellier, FranceMember

Hello,
on the SelectVariants doc page (here), it is written that the following command:

java -jar GenomeAnalysisTK.jar \
   -T SelectVariants \
   -R reference.fasta \
   -V input.vcf \
   -ped family.ped \
   -mv -mvq 50 -invMv \
   -o violations.vcf

corresponds to: "Generating a VCF of all the variants that are not mendelian violations. The optional argument '-mvq' together with '-invMv' restricts the selection to sites that have a QUAL score of 50 or less".

Moreover, below, the options are described and it is written:

  • -mv means "Output mendelian violation sites only";
  • -invMv means "Output non-mendelian violation sites only";
  • -mvq means "Minimum GQ score for each trio member to accept a site as a violation".

I hence conclude the following:

  • -mv alone means "generate a VCF of only the variants that are mendelian violations";
  • -invMv means "generate a VCF of only the variants that are not mendelian violations";
  • -mv -invMv is not unambiguously defined, does -invMv take precedence over -mv?;
  • -mv mvq 50 means "generate a VCF of only the variants that have QUAL > 50 and that are mendelian violations";
  • -invMv mvq 50 means "generate a VCF of only the variants that have QUAL > 50 and that are not mendelian violations";
  • -mv -mvq 50 -invMv (as in the example above) is ambiguous.

Am I right? If not, can you explain to me the meanings of these various options?

Best,
TF

Best Answer

Answers

Sign In or Register to comment.