Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

In SelectVariants are -conc and -disc complementory in a set-theoretic sence?

yfarjounyfarjoun Broad InstitutePosts: 15GATK Developer mod

I'm looking to find all the entries that change between two calls to UG on the same data. I would like to find all the entries where the call in the variant tract are different from those in the comparison track. So in effect I want those entries that would not be result from -using -conc in SelectVariants. From the documentation is is unclear if the -disc option does this:

A site is considered discordant if there exists some sample in the variant track that has a non-reference genotype and either the site isn't present in this track, the sample isn't present in this track, or the sample is called reference in this track.

What if the comp is HOM_VAR and the variant track is HET? Or if they are both HET but disagree on the specific allele?


Best Answer

  • CarneiroCarneiro Posts: 275 admin
    edited November 2012 Answer ✓

    There are many options that go with -disc and -conc. It depends whether you're interested in the genotypes or just in whether or not it was a call. The simple use-case is complementary but if you do more complex queries, you will get into two distinct scenarios. These simple examples are in the GATKDocs and I think would be helpful for you to decide which one you need to answer your particular question:

    Select all calls missed in my vcf, but present in HapMap (useful to take a look at why these variants weren't called by this dataset): java -Xmx2g -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant hapmap.vcf --discordance myCalls.vcf -o output.vcf -sn mySample

    Select all calls made by both myCalls and hisCalls (useful to take a look at what is consistent between the two callers): java -Xmx2g -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant myCalls.vcf --concordance hisCalls.vcf -o output.vcf -sn mySample

    Post edited by Carneiro on


Sign In or Register to comment.