Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Results for genotype concordance

lalirlalir CanadaMember

Hello,
I just ran genotype concordance in order to determine how similar two samples were. However, in the output, everything is showing up as zero, the NRD determined is 1 and the overall genotype concordance is also 1. Has anyone encountered this before. The two samples I am using for --eval and --comp are actually the same sample that were sequenced using two different methodologies (HiSeq and Ion proton). So it is odd that I am getting this output. Any help is appreciated.

Thanks,
Ricky

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Can you please post your command line, a few records from each file, and the output that you get?

  • lalirlalir CanadaMember

    Hi @Geraldine_VdAuwera Thanks for your reply. I have posted what you requested below:

     java -jar GenomeAnalysisTK.jar \
    
    -T GenotypeConcordance \
    
    -R ucsc.hg19.fasta \
    
    --eval input1.vcf \
    
    --comp input2.vcf \
    
    -o results.vcf 
    

    These are from the input1.vcf (sequenced through Ion proton)

    chr1 871334 . G T 627.69 PASS AF=1;AO=66;DP=66;FAO=66;FDP=66;FR=.;FRO=0;FSAF=43;FSAR=23;FSRF=0;FSRR=0;FWDB=0.0884998;FXX=0;HRUN=2;LEN=1;MLLD=61.0335;QD=38.0418;RBI=0.0914006;REFB=0;REVB=0.022844;RO=0;SAF=43;SAR=23;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=0;STB=0.5;STBP=1;TYPE=snp;VARB=0.00113677;OID=.;OPOS=871334;OREF=G;OALT=T;OMAPALT=T
    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:30:66:66:0:0:66:66:1:23:43:0:0:23:43:0:0
    chr1 876499 . A G 406.82 PASS AF=0.976744;AO=38;DP=46;FAO=42;FDP=43;FR=.,HEALED;FRO=0;FSAF=25;FSAR=17;FSRF=0;FSRR=0;FWDB=0.047727;FXX=0.0850883;HRUN=1;LEN=1;MLLD=255.227;QD=37.844;RBI=0.0646258;REFB=0;REVB=0.0435732;RO=0;SAF=24;SAR=14;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=-3.76123e-08;STB=0.5;STBP=1;TYPE=snp;VARB=0.00734494;OID=.;OPOS=876499;OREF=A;OALT=G;OMAPALT=G GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:6:46:43:0:0:38:42:0.976744:14:24:0:0:17:25:0:0
    chr1 880238 . A G 351.11 PASS AF=1;AO=37;DP=37;FAO=37;FDP=37;FR=.;FRO=0;FSAF=17;FSAR=20;FSRF=0;FSRR=0;FWDB=0.011607;FXX=0;HRUN=1;LEN=1;MLLD=65.0544;QD=37.9581;RBI=0.0273758;REFB=0;REVB=0.0247933;RO=0;SAF=17;SAR=20;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=3.71161e-08;STB=0.5;STBP=1;TYPE=snp;VARB=0.000600901;OID=.;OPOS=880238;OREF=A;OALT=G;OMAPALT=G GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 1/1:17:37:37:0:0:37:37:1:20:17:0:0:20:17:0:0

    These are from the input2.vcf (HiSeq)

    chr1 15906 . A G 8536.12 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=6.913;DP=14;Dels=0.00;FS=1202.837;HRun=1;HaplotypeScore=5.9838;InbreedingCoeff=-0.0881;MQ=3
    3.27;MQ0=1267;MQRankSum=-18.086;NEGATIVE_TRAIN_SITE;QD=3.05;ReadPosRankSum=14.180;VQSLOD=-1.092e+01;culprit=FS;set=variant2 GT:AD:DP:GQ:PL 0/1:10,2:14:40:40,0,153
    chr1 663097 rs142727405 G C 269791.53 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=-26.603;DB;DP=172;Dels=0.00;FS=21.281;HRun=0;HaplotypeScore=0.3640;Inbreedin
    gCoeff=0.5483;MQ=18.59;MQ0=1849;MQRankSum=2.089;QD=4.73;ReadPosRankSum=1.643;VQSLOD=-8.230e+00;culprit=MQ;set=variant2 GT:AD:DP:GQ:PL 1/1:0,172:172:99:1701,135,0
    chr1 663506 . T C 2855.83 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=1.247;DP=2;Dels=0.00;FS=10.115;HRun=1;HaplotypeScore=0.1708;InbreedingCoeff=0.1338;MQ=13.09;
    MQ0=93;MQRankSum=-1.021;QD=3.31;ReadPosRankSum=5.644;VQSLOD=-9.043e+00;culprit=MQ;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:3:25,3,0
    chr1 761732 rs2286139 C T 278674.22 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=-19.280;DB;DP=38;Dels=0.00;FS=1008.586;HRun=0;HaplotypeScore=0.1641;Inbreedi
    ngCoeff=0.2472;MQ=47.93;MQ0=591;MQRankSum=-36.693;NEGATIVE_TRAIN_SITE;POSITIVE_TRAIN_SITE;QD=17.44;ReadPosRankSum=14.176;VQSLOD=-3.992e+00;culprit=FS;set=variant2 GT:AD:DP:GQ:PL
    1/1:0,38:38:84:1035,84,0

    This is the output file

    Sample NO_CALL_HOM_REF NO_CALL_HET NO_CALL_HOM_VAR HOM_REF_HOM_REF HOM_REF_HET HOM_REF_HOM_VAR HET_HOM_REF HET_HET HET_HOM_VAR HOM_VAR_HOM_REF HOM_VAR_HET HOM_VAR_HOM_VAR UNAVAILABLE_HOM_REF UNAVAILABLE_HET UNAVAILABLE_HOM_VAR MIXED_HOM_REF MIXED_HET MIXED_HOM_VAR Mismatching_Alleles
    ALL 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

    :GATKTable:38:1:%s:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:%d:;

    :GATKTable:GenotypeConcordance_Counts:Per-sample concordance tables: comparison counts

    Sample NO_CALL_NO_CALL NO_CALL_HOM_REF NO_CALL_HET NO_CALL_HOM_VAR NO_CALL_UNAVAILABLE NO_CALL_MIXED HOM_REF_NO_CALL HOM_REF_HOM_REF HOM_REF_HET HOM_REF_HOM_VAR HOM_REF_UNAVAILABLE HOM_REF_MIXED HET_NO_CALL HET_HOM_REF HET_HET HET_HOM_VAR HET_UNAVAILABLE HET_MIXED HOM_VAR_NO_CALL HOM_VAR_HOM_REF HOM_VAR_HET HOM_VAR_HOM_VAR HOM_VAR_UNAVAILABLE HOM_VAR_MIXED UNAVAILABLE_NO_CALL UNAVAILABLE_HOM_REF UNAVAILABLE_HET UNAVAILABLE_HOM_VAR UNAVAILABLE_UNAVAILABLE UNAVAILABLE_MIXED MIXED_NO_CALL MIXED_HOM_REF MIXED_HET MIXED_HOM_VAR MIXED_UNAVAILABLE MIXED_MIXED Mismatching_Alleles
    ALL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    :GATKTable:20:1:%s:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:%.3f:;

    :GATKTable:GenotypeConcordance_EvalProportions:Per-sample concordance tables: proportions of genotypes called in eval

    Sample HOM_REF_NO_CALL HOM_REF_HOM_REF HOM_REF_HET HOM_REF_HOM_VAR HOM_REF_UNAVAILABLE HOM_REF_MIXED HET_NO_CALL HET_HOM_REF HET_HET HET_HOM_VAR HET_UNAVAILABLE HET_MIXED HOM_VAR_NO_CALL HOM_VAR_HOM_REF HOM_VAR_HET HOM_VAR_HOM_VAR HOM_VAR_UNAVAILABLE HOM_VAR_MIXED Mismatching_Alleles
    ALL 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

    :GATKTable:4:1:%s:%.3f:%.3f:%.3f:;

    :GATKTable:GenotypeConcordance_Summary:Per-sample summary statistics: NRS, NRD, and OGC

    Sample Non-Reference Sensitivity Non-Reference Discrepancy Overall_Genotype_Concordance
    ALL 0.000 1.000 1.000

    :GATKTable:6:1:%d:%d:%d:%d:%d:%d:;

    :GATKTable:SiteConcordance_Summary:Site-level summary statistics

    ALLELES_MATCH EVAL_SUPERSET_TRUTH EVAL_SUBSET_TRUTH ALLELES_DO_NOT_MATCH EVAL_ONLY TRUTH_ONLY
    12144 34 4 95 38059 38210

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Ricky,

    Sorry for the late response. The sites you posted are from very different regions in the chromosome. Have you checked how many sites actually overlap between your two files?

  • lalirlalir CanadaMember

    Hi @Geraldine_VdAuwera‌
    I realized this and ran CombineVariants and used SelectVariants to select out the calls that intersected between the two files. Should I use this file in GenotypeConcordance?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, please try that and let me know what the results are.

  • lalirlalir CanadaMember

    Hi @Geraldine_VdAuwera, I just need some clarification: Should I use the intersect file as the comp or eval with GenotypeConcordance? Also, I realized that the references sequence I was using in the -R argument was different than the reference sequence that my samples were aligned to. Could this also contribute to skewed results with GenotypeConcordance and other tools?

    Thanks again for your assistance.

    Ricky

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry, I didn't express myself very clearly. I wanted you to check that there were in fact sites that are present in both call sets to make sure that the program is getting the data it needs to work properly. If there are such intersecting sites, then your original command is applicable and the problem lies somewhere else.

    What do you mean when you say the reference sequence was not the same? Using the same reference consistently is very important for GATK analysis.

Sign In or Register to comment.