Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ambiguous explanation of -invMv for SelectVariants

timflutretimflutre Montpellier, FranceMember

Hello,
on the SelectVariants doc page (here), it is written that the following command:

java -jar GenomeAnalysisTK.jar \
   -T SelectVariants \
   -R reference.fasta \
   -V input.vcf \
   -ped family.ped \
   -mv -mvq 50 -invMv \
   -o violations.vcf

corresponds to: "Generating a VCF of all the variants that are not mendelian violations. The optional argument '-mvq' together with '-invMv' restricts the selection to sites that have a QUAL score of 50 or less".

Moreover, below, the options are described and it is written:

  • -mv means "Output mendelian violation sites only";
  • -invMv means "Output non-mendelian violation sites only";
  • -mvq means "Minimum GQ score for each trio member to accept a site as a violation".

I hence conclude the following:

  • -mv alone means "generate a VCF of only the variants that are mendelian violations";
  • -invMv means "generate a VCF of only the variants that are not mendelian violations";
  • -mv -invMv is not unambiguously defined, does -invMv take precedence over -mv?;
  • -mv mvq 50 means "generate a VCF of only the variants that have QUAL > 50 and that are mendelian violations";
  • -invMv mvq 50 means "generate a VCF of only the variants that have QUAL > 50 and that are not mendelian violations";
  • -mv -mvq 50 -invMv (as in the example above) is ambiguous.

Am I right? If not, can you explain to me the meanings of these various options?

Best,
TF

Best Answer

Answers

Sign In or Register to comment.