Genotype Concordane Question

Arti TandonArti Tandon Harvard UniversityMember

I am using the Genotyper Concordance tool to compare the same sample processed in two different ways. The two VCF files have these samples labeled as having different IDs, and when I do a comparison I get the following output:

:GATKTable:GenotypeConcordance_Summary:Per-sample summary statistics: NRS and NRD

Sample Non-Reference Sensitivity Non-Reference Discrepancy
ALL 0.000 1.000

:GATKTable:6:1:%d:%d:%d:%d:%d:%d:;

:GATKTable:SiteConcordance_Summary:Site-level summary statistics

ALLELES_MATCH EVAL_SUPERSET_TRUTH EVAL_SUBSET_TRUTH ALLELES_DO_NOT_MATCH EVAL_ONLY TRUTH_ONLY
51763 14 23 26 5207 34837512

Is this because the ID's do not match? Is there an option to set up a naming mapping, and why am I not getting a value for Overall_Genotype_Concordance?

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Hi,

    Are you not getting a column at all for "Overall_Genotype_Concordance"?

    Issues do arise when the two sample IDs are not the same, but you can easily edit your vcf files to have the same sample name.

    -Sheila

  • Arti TandonArti Tandon Harvard UniversityMember

    Hi Sheila,

    No I am not getting that column at all, and am unable to edit the VCF files since they are so large, and I would like to keep them in zipped format. Is there any work around?

    Thanks,
    Arti

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Hi Arti,

    Unfortunately, I do not know of any way to edit vcf files while they are zipped.
    Can you please post the exact command you ran and which version of GATK you are using?

    Thanks,
    Sheila

  • Arti TandonArti Tandon Harvard UniversityMember

    Hi Sheila,

    Ok, here is the exact command, with version #:GenomeAnalysisTK-2.5-2-gf57256b

    java -jar GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T GenotypeConcordance -eval S1.22.vcf.gz -comp S2.22.vcf.gz -R hs37d5.fa -o S1_S2.out

    Thanks,
    Arti

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Hi Arti,

    Unfortunately, we cannot help you unless you are using the latest GATK version. Please upgrade and let us know if this issue still persists. https://www.broadinstitute.org/gatk/download

    Thanks,
    Sheila

  • Arti TandonArti Tandon Harvard UniversityMember

    Hi Sheila,

    I am using the GATK version 3.2-2, and it seems to be working now. When I run Genotype Concordance for a single chromosome, I get the following output:

    :GATKTable:4:2:%s:%.3f:%.3f:%.3f:;

    :GATKTable:GenotypeConcordance_Summary:Per-sample summary statistics: NRS, NRD, and OGC

    Sample Non-Reference Sensitivity Non-Reference Discrepancy Overall_Genotype_Concordance
    ALL 0.855 0.138 1.000
    LP6005442-DNA_A12 0.855 0.138 1.000

    However, I would like to run Genotype Concordance across all chromosomes and calculate the overall concordance rate for the two samples. What is the most efficient and quick way to do that, and how do I combine these numbers across choromosomes?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Hi Arti,

    So, you have already run GenotypeConcordance on 1 sample, and you would like to run it on 2 samples? I see in the output you posted, there is only 1 sample LP6005442-DNA_A12. Is the other sample in a different vcf?

    If the other sample is in another vcf, you can use CombineVariants to combine the two vcfs and make 1 vcf with both samples. https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php
    You can then run GenotypeConcordance on that vcf to get what you want.

    I hope I answered your question!

    -Sheila

  • Arti TandonArti Tandon Harvard UniversityMember

    Hi Sheila,

    Thanks for your help, however I might not have been clear in my question.

    This is the command I ran on a single sample across chr 22:
    java -jar /opt/gatk-3.2-2/gatk/GenomeAnalysisTK.jar -T GenotypeConcordance -eval S1.22.vcf.gz -comp S2.22.vcf.gz -R hs37d5.fa

    and I got the output in the above comment. I would like to calculate the concordance rate for this sample across all chromosomes/ genome-wide. Should I combine the per chromosome VCF files using CombineVariants and then run Genotype Concordance; or is there a better way?

    Thanks,
    Arti

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Ah, okay. I did not answer your question above. Thanks for clarifying.

    You can use CatVariants to get all the chromosomes into one single vcf
    https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_CatVariants.php

    Then, you can run GenotypeConcordance on the output vcf.

    -Sheila

  • Arti TandonArti Tandon Harvard UniversityMember

    Thanks Sheila! I am running CatVariants, but it seems to be hanging; since the files are quite big. will it be more efficient to run it separately on each chromosome, and combine numbers for a genome wide number?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Arti Tandon

    Yes, it is fine to run on each chromosome separately then combine the results for the whole genome.

Sign In or Register to comment.