Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

REF ALT conversion - multiple sample vcf > single sample vcf + trimAlternates

Dear GATK Team,

I have extracted the first sample of the following multiple sample vcf example:
chr1 3007281 . CCTT CT,C 3654.92 . AC=21,2;AF=0.157,0.015;AN=134;BaseQRankSum=1.47;ClippingRankSum=0.668;DP=10953;FS=7.789;GQ_MEAN=29.39;GQ_STDDEV=27.74;InbreedingCoeff=0.7055;MLEAC=22,2;MLEAF=0.164,0.015;MQ=60.93;MQ0=0;MQRankSum=-3.800e-01;NCC=0;QD=16.03;ReadPosRankSum=0.760;SOR=0.522;set=Intersection GT:AD:DP:GQ:PL 0/1:10,4,0:14:38:38,0,312,68,324,392 0/0:21,0,0:21:9:0,9,135,9,135,135

with

GenomeAnalysisTK.jar -nt 2 -T SelectVariants -trimAlternates -V $INPUT -R $R -o $OUTPUT -sn 1

and I get this:
chr1 3007281 . CCT C 3654.92 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.47;ClippingRankSum=0.668;DP=14;FS=7.789;GQ_MEAN=29.39;GQ_STDDEV=27.74;InbreedingCoeff=0.7055;MQ=60.93;MQ0=0;MQRankSum=-3.8
00e-01;NCC=0;QD=16.03;ReadPosRankSum=0.760;SOR=0.522;set=Intersection GT:AD:DP:GQ:PL 0/1:10,4:14:38:38,0,312

The question is why I get C as the ALT allele by using trimAlternates?

Thank you in anticipation

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kullrich
    Hi,

    Can you post the next record after that site?

    Thanks,
    Sheila

  • kullrichkullrich GermanyMember

    chr1 3007282 . CT C 647.51 . AC=8;AF=0.060;AN=134;BaseQRankSum=0.322;ClippingRankSum=0.694;DP=10836;FS=10.462;GQ_MEAN=39.30;GQ_STDDEV=26.38;InbreedingCoeff=0.0873;MLEAC=8;MLEAF=0.060;MQ=60.47;MQ0=0;MQRankSum=0.720;NCC=0;QD=8.30;ReadPosRankSum=1.38;SOR=1.293;set=Intersection GT:AD:DP:GQ:PL 0/0:14,0:14:42:0,42,386 0/1:4,9:13:32:124,0,32
    chr1 3007283 . T C 315.50 VQSRTrancheSNP90.00to99.00 AC=6;AF=0.051;AN=118;BaseQRankSum=1.51;ClippingRankSum=0.067;DP=10395;FS=10.076;GQ_MEAN=22.46;GQ_STDDEV=32.74;InbreedingCoeff=0.2761;MLEAC=9;MLEAF=0.076;MQ=60.00;MQ0=0;MQRankSum=-7.620e-01;NCC=8;POSITIVE_TRAIN_SITE;QD=6.19;ReadPosRankSum=0.762;SOR=0.390;VQSLOD=0.661;culprit=QD;set=FilteredInAll GT:AD:DP:GQ:PL 0/0:13,0:13:39:0,39,320 0/1:9,2:11:8:8,0,129

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    That appears to be incorrect. What version are you using? Can you please test whether this happens in the latest version (3.7, released today)?
  • kullrichkullrich GermanyMember

    Hi,
    it seems to be no difference which version I use for the single sample extraction:

    INPUT:

    chr1 3007280 . CCCT C 4944.56 . AC=18;AF=0.148;AN=122;BaseQRankSum=-4.630e-01;ClippingRankSum=-1.733e+00;DP=11196;FS=1.272;GQ_MEAN=60.30;GQ_STDDEV=102.23;InbreedingCoeff=0.4734;MLEAC=19;MLEAF=0.156;MQ=60.40;MQ0=0;MQRankSum=-4.950e-01;NCC=6;QD=19.54;ReadPosRankSum=1.18;SOR=0.759;set=Intersection GT:AD:DP:GQ:PL 0/0:22,0:22:27:0,27,405 0/0:21,0:21:9:0,9,135
    chr1 3007281 . CCTT CT,C 3654.92 . AC=21,2;AF=0.157,0.015;AN=134;BaseQRankSum=1.47;ClippingRankSum=0.668;DP=10953;FS=7.789;GQ_MEAN=29.39;GQ_STDDEV=27.74;InbreedingCoeff=0.7055;MLEAC=22,2;MLEAF=0.164,0.015;MQ=60.93;MQ0=0;MQRankSum=-3.800e-01;NCC=0;QD=16.03;ReadPosRankSum=0.760;SOR=0.522;set=Intersection GT:AD:DP:GQ:PL 0/1:10,4,0:14:38:38,0,312,68,324,392 0/0:21,0,0:21:9:0,9,135,9,135,135
    chr1 3007282 . CT C 647.51 . AC=8;AF=0.060;AN=134;BaseQRankSum=0.322;ClippingRankSum=0.694;DP=10836;FS=10.462;GQ_MEAN=39.30;GQ_STDDEV=26.38;InbreedingCoeff=0.0873;MLEAC=8;MLEAF=0.060;MQ=60.47;MQ0=0;MQRankSum=0.720;NCC=0;QD=8.30;ReadPosRankSum=1.38;SOR=1.293;set=Intersection GT:AD:DP:GQ:PL 0/0:14,0:14:42:0,42,386 0/1:4,9:13:32:124,0,32
    chr1 3007283 . T C 315.50 VQSRTrancheSNP90.00to99.00 AC=6;AF=0.051;AN=118;BaseQRankSum=1.51;ClippingRankSum=0.067;DP=10395;FS=10.076;GQ_MEAN=22.46;GQ_STDDEV=32.74;InbreedingCoeff=0.2761;MLEAC=9;MLEAF=0.076;MQ=60.00;MQ0=0;MQRankSum=-7.620e-01;NCC=8;POSITIVE_TRAIN_SITE;QD=6.19;ReadPosRankSum=0.762;SOR=0.390;VQSLOD=0.661;culprit=QD;set=FilteredInAll GT:AD:DP:GQ:PL 0/0:13,0:13:39:0,39,320 0/1:9,2:11:8:8,0,129

    GATK 3.7 OUTPUT:

    chr1 3007280 . CCCT . 4944.56 . AN=2;BaseQRankSum=-4.630e-01;ClippingRankSum=-1.733e+00;DP=22;FS=1.272;GQ_MEAN=60.30;GQ_STDDEV=102.23;InbreedingCoeff=0.4734;MQ=60.40;MQ0=0;MQRankSum=-4.950e-01;NCC=6;QD=19.54;ReadPosRankSum=1.18;SOR=0.759;set=Intersection GT:AD:DP:GQ:PL 0/0:22:22:27:0
    chr1 3007281 . CCT C 3654.92 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.47;ClippingRankSum=0.668;DP=14;FS=7.789;GQ_MEAN=29.39;GQ_STDDEV=27.74;InbreedingCoeff=0.7055;MQ=60.93;MQ0=0;MQRankSum=-3.800e-01;NCC=0;QD=16.03;ReadPosRankSum=0.760;SOR=0.522;set=Intersection GT:AD:DP:GQ:PL 0/1:10,4:14:38:38,0,312
    chr1 3007282 . CT . 647.51 . AN=2;BaseQRankSum=0.322;ClippingRankSum=0.694;DP=14;FS=10.462;GQ_MEAN=39.30;GQ_STDDEV=26.38;InbreedingCoeff=0.0873;MQ=60.47;MQ0=0;MQRankSum=0.720;NCC=0;QD=8.30;ReadPosRankSum=1.38;SOR=1.293;set=Intersection GT:AD:DP:GQ:PL 0/0:14:14:42:0

    GATK 3.6 OUTPUT:

    chr1 3007280 . CCCT . 4944.56 . AN=2;BaseQRankSum=-4.630e-01;ClippingRankSum=-1.733e+00;DP=22;FS=1.272;GQ_MEAN=60.30;GQ_STDDEV=102.23;InbreedingCoeff=0.4734;MQ=60.40;MQ0=0;MQRankSum=-4.950e-01;NCC=6;QD=19.54;ReadPosRankSum=1.18;SOR=0.759;set=Intersection GT:AD:DP:GQ:PL 0/0:22:22:27:0
    chr1 3007281 . CCT C 3654.92 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.47;ClippingRankSum=0.668;DP=14;FS=7.789;GQ_MEAN=29.39;GQ_STDDEV=27.74;InbreedingCoeff=0.7055;MQ=60.93;MQ0=0;MQRankSum=-3.800e-01;NCC=0;QD=16.03;ReadPosRankSum=0.760;SOR=0.522;set=Intersection GT:AD:DP:GQ:PL 0/1:10,4:14:38:38,0,312
    chr1 3007282 . CT . 647.51 . AN=2;BaseQRankSum=0.322;ClippingRankSum=0.694;DP=14;FS=10.462;GQ_MEAN=39.30;GQ_STDDEV=26.38;InbreedingCoeff=0.0873;MQ=60.47;MQ0=0;MQRankSum=0.720;NCC=0;QD=8.30;ReadPosRankSum=1.38;SOR=1.293;set=Intersection GT:AD:DP:GQ:PL 0/0:14:14:42:0

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @kullrich,

    In this case, please can you submit a snippet of your data to recapitulate the observations? Instructions for providing the files are at https://software.broadinstitute.org/gatk/documentation/article?id=1894. Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Oh hang on, I think I misread the original post. This actually looks correct. For the sample of interest, the original genotype is CCTT/CT. When you extract that with the trimming enabled, you're asking the program to get rid of any bases that are common to all alleles on the right side (because we left-align everything by default). So the T goes away and you're left with CCT/C.

    Does that make sense?
Sign In or Register to comment.