The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

In SelectVariants are -conc and -disc complementory in a set-theoretic sence?

yfarjounyfarjoun Broad InstituteDev

I'm looking to find all the entries that change between two calls to UG on the same data. I would like to find all the entries where the call in the variant tract are different from those in the comparison track. So in effect I want those entries that would not be result from -using -conc in SelectVariants. From the documentation is is unclear if the -disc option does this:

A site is considered discordant if there exists some sample in the variant track that has a non-reference genotype and either the site isn't present in this track, the sample isn't present in this track, or the sample is called reference in this track.

What if the comp is HOM_VAR and the variant track is HET? Or if they are both HET but disagree on the specific allele?

Thanks.

Best Answer

  • CarneiroCarneiro Charlestown, MAMember
    edited November 2012 Accepted Answer

    There are many options that go with -disc and -conc. It depends whether you're interested in the genotypes or just in whether or not it was a call. The simple use-case is complementary but if you do more complex queries, you will get into two distinct scenarios. These simple examples are in the GATKDocs and I think would be helpful for you to decide which one you need to answer your particular question:

    Select all calls missed in my vcf, but present in HapMap (useful to take a look at why these variants weren't called by this dataset):
    java -Xmx2g -jar GenomeAnalysisTK.jar
    -R ref.fasta
    -T SelectVariants
    --variant hapmap.vcf
    --discordance myCalls.vcf
    -o output.vcf
    -sn mySample

    Select all calls made by both myCalls and hisCalls (useful to take a look at what is consistent between the two callers):
    java -Xmx2g -jar GenomeAnalysisTK.jar
    -R ref.fasta
    -T SelectVariants
    --variant myCalls.vcf
    --concordance hisCalls.vcf
    -o output.vcf
    -sn mySample

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Have you tried running the tool on such a site and looking at the output?

  • yfarjounyfarjoun Broad InstituteDev

    I have. I found them to be complementary. However, since this didn't seem to agree exactly with the documentation, I wanted to make sure that I understood my results correctly.

  • CarneiroCarneiro Charlestown, MAMember
    edited November 2012 Accepted Answer

    There are many options that go with -disc and -conc. It depends whether you're interested in the genotypes or just in whether or not it was a call. The simple use-case is complementary but if you do more complex queries, you will get into two distinct scenarios. These simple examples are in the GATKDocs and I think would be helpful for you to decide which one you need to answer your particular question:

    Select all calls missed in my vcf, but present in HapMap (useful to take a look at why these variants weren't called by this dataset):
    java -Xmx2g -jar GenomeAnalysisTK.jar
    -R ref.fasta
    -T SelectVariants
    --variant hapmap.vcf
    --discordance myCalls.vcf
    -o output.vcf
    -sn mySample

    Select all calls made by both myCalls and hisCalls (useful to take a look at what is consistent between the two callers):
    java -Xmx2g -jar GenomeAnalysisTK.jar
    -R ref.fasta
    -T SelectVariants
    --variant myCalls.vcf
    --concordance hisCalls.vcf
    -o output.vcf
    -sn mySample

  • yfarjounyfarjoun Broad InstituteDev

    Let me qualify that. They are complementary, but I haven't managed to find how to distinguish between a HET and a HOM_VAR if the genotype is the same....

Sign In or Register to comment.