Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

VariantsToTable issue

Dear Gatk team,

I have used VariantsToTable to extract specific features, by using the following command:

gatk VariantsToTable -V Annotated_NG.hg19_multianno.vcf -F CHROM -F AF -F MAF -F DP -GF GT -GF AD -GF DP -GF GQ -GF PL -F gnomAD_genome_ALL -F gnomAD_genome_AFR -F gnomAD_genome_AMR -F gnomAD_genome_ASJ -F gnomAD_genome_EAS -F gnomAD_genome_FIN -F gnomAD_genome_NFE -F gnomAD_genome_OTH -O Saudi_NG_gnomAD.txt

And I got the following error:

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 544: unparsable vcf record with allele *TTCT, for input source: Annotated_NG.hg19_multianno.vcf

Could anyone help me in that issue?

Tagged:

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Sakhaa

    Looks like there is an issue with the input vcf file. How was this file generated? Please run ValidateVariants tool on Annotated_NG.hg19_multianno.vcf to detect the cause for the error.

  • SakhaaSakhaa Member

    Hi @bhanuGandham

    I did the validation bu using the fplloing command:
    gatk ValidateVariants -R $REF -V Annotated_NG.hg19_multianno.vcf --dbsnp $dbSN

    and I got the following error:
    A USER ERROR has occurred: Input Annotated_NG.hg19_multianno.vcf fails strict validation: one or more of the ALT allele(s) for the record at position 1:10409 are not observed at all in the sample genotypes of type:

    And the position 1:10409

    1   10409   .   ACCCTAACCCTAACCCTAACCCTAACCCTAAC    A,* 1478.73 .   AC=11,0;AF=0.0514019,0;AN=214;BaseQRankSum=0.524;ClippingRankSum=0;DP=3050;ExcessHet=9.4611;FS=30.723;InbreedingCoeff=-0.1038;MLEAC=10,18;MLEAF=0.083,0.15;MQ=24.2;MQRankSum=0.21;QD=20.39;ReadPosRankSum=-0.524;SF=0,1;SOR=3.228;NS=107;MAF=0.0514019,0;AC_Het=9,0;AC_Hom=2,0;AC_Hemi=0,0;HWE=0.236917,1;ExcHet=0.979421,1;ANNOVAR_DATE=2018-04-16;gnomAD_genome_ALL=0.0535;gnomAD_genome_AFR=0.0217;gnomAD_genome_AMR=0.0317;gnomAD_genome_ASJ=0.0625;gnomAD_genome_EAS=0.0130;gnomAD_genome_FIN=0.0443;gnomAD_genome_NFE=0.0784;gnomAD_genome_OTH=0.0556;ALLELE_END;ANNOVAR_DATE=2018-04-16;gnomAD_genome_ALL=.;gnomAD_genome_AFR=.;gnomAD_genome_AMR=.;gnomAD_genome_ASJ=.;gnomAD_genome_EAS=.;gnomAD_genome_FIN=.;gnomAD_genome_NFE=.;gnomAD_genome_OTH=.;ALLELE_END   GT:PID:PGT:GQ:DP:PL:AD  0/0:.:.:0:28:0,0,484,0,484,484:28,0,0   0/1:10409_ACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:78:4:78,0,90,84,96,181:2,2,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:1|1:23:7:302,302,302,23,23,0:0,0,7   0/0:.:.:6:2:0,6,54,6,54,54:2,0,0    0/1:.:.:75:5:121,0,75,127,85,213:2,3,0  0/0:.:.:36:3:81,84,126,0,42,36:1,0,2    0/0:.:.:23:25:0,23,584,23,584,584:25,0,0    0/0:.:.:5:28:0,5,606,5,606,606:28,0,0   0/0:.:.:0:48:0,0,909,0,909,909:48,0,0   0/0:.:.:0:64:0,0,1161,0,1161,1161:64,0,0    0/0:.:.:0:67:0,0,1183,0,1183,1183:67,0,0    0/0:.:.:0:74:0,0,1457,0,1457,1457:74,0,0    0/0:.:.:21:9:0,22,631,21,630,629:9,0,0  0/0:.:.:42:50:0,42,630,42,630,630:50,0,0    0/0:.:.:0:79:0,0,1070,0,1070,1070:79,0,0    0/0:.:.:89:8:89,104,276,0,172,162:5,0,3 0/0:.:.:0:45:0,0,798,0,798,798:45,0,0   0/0:.:.:99:10:187,202,387,0,185,168:5,0,5   0/0:.:.:0:76:0,0,1573,0,1573,1573:76,0,0    0/0:.:.:29:2:29,32,74,0,42,39:1,0,1 0/0:.:.:0:58:0,0,1000,0,1000,1000:58,0,0    0/0:.:.:0:43:0,0,778,0,778,778:43,0,0   0/0:.:.:0:53:0,0,1011,0,1011,1011:53,0,0    0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:99:9:109,126,532,0,406,397:6,0,3 0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:53:11:53,79,362,0,283,276:9,0,2  0/0:.:.:0:26:0,0,483,0,483,483:26,0,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:14:6:0,16,276,14,274,273:5,0,1   0/0:.:.:0:43:0,0,448,0,448,448:43,0,0   0/0:.:.:0:25:0,0,406,0,406,406:25,0,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:8:3:0,8,95,8,95,95:3,0,0 0/1:10409_ACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:23:4:23,0,89,32,92,124:3,1,0   0/0:.:.:0:22:0,0,349,0,349,349:22,0,0   0/0:.:.:0:28:0,0,354,0,354,354:28,0,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:99:7:115,126,301,0,175,166:4,0,3 0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:39:2:39,42,84,0,42,39:1,0,1  0/0:.:.:18:6:0,18,304,18,304,303:6,0,0  0/0:.:.:36:41:0,36,540,36,540,540:41,0,0    0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:69:7:204,210,294,0,84,69:2,0,5   0/0:.:.:36:52:0,36,540,36,540,540:52,0,0    0/0:.:.:0:37:0,0,633,0,633,633:37,0,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:53:7:53,71,458,0,387,380:6,0,1   0/0:.:.:0:29:0,0,236,0,236,236:29,0,0   0/0:.:.:0:57:0,0,961,0,961,961:57,0,0   0/0:.:.:0:62:0,0,1053,0,1053,1053:62,0,0    0/0:.:.:0:32:0,0,463,0,463,463:32,0,0   0/1:.:.:24:4:81,0,24,86,33,119:2,2,0    0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:0|1:7:3:0,7,233,7,233,233:3,0,0  0/0:.:.:99:6:117,126,243,0,117,108:3,0,3    0/0:.:.:70:4:117,120,199,0,79,70:1,0,3  0/0:.:.:36:38:0,36,540,36,540,540:38,0,0    0/1:.:.:99:8:145,0,146,157,158,316:4,4,0    0/0:.:.:0:27:0,0,565,0,565,565:27,0,0   0/0:.:.:24:35:0,24,360,24,360,360:35,0,0    0/1:.:.:14:6:14,0,115,30,118,148:5,1,0  0/0:.:.:0:36:0,0,623,0,623,623:36,0,0   0/0:.:.:0:42:0,0,548,0,548,548:42,0,0   0/1:.:.:99:5:109,0,133,115,142,257:2,3,0    0/0:.:.:0:36:0,0,641,0,641,641:36,0,0   0/0:.:.:16:5:0,16,192,16,192,192:5,0,0  1/1:.:.:3:1:44,3,0,45,4,46:0,1,0    0/0:10343_CCCTAACCCTA_C:0|1:12:4:0,12,163,12,163,163:4,0,0  0/0:.:.:0:24:0,0,473,0,473,473:24,0,0   0/0:.:.:21:31:0,21,315,21,315,315:31,0,0    0/0:.:.:0:31:0,0,505,0,505,505:31,0,0   0/0:.:.:9:3:96,97,98,9,10,0:0,0,3   0/0:.:.:9:25:0,9,135,9,135,135:25,0,0   0/0:.:.:0:25:0,0,225,0,225,225:25,0,0   0/0:.:.:0:22:0,0,540,0,540,540:22,0,0   0/0:.:.:0:26:0,0,341,0,341,341:26,0,0   0/0:.:.:0:22:0,0,353,0,353,353:22,0,0   0/0:.:.:75:3:90,96,180,0,84,75:2,0,1    0/0:.:.:33:38:0,33,495,33,495,495:38,0,0    0/1:.:.:26:3:26,0,75,32,78,110:2,1,0    0/0:.:.:15:42:0,15,955,15,955,955:42,0,0    0/0:.:.:0:24:0,0,381,0,381,381:24,0,0   0/0:.:.:0:33:0,0,418,0,418,418:33,0,0   0/0:.:.:92:5:155,158,263,0,105,92:1,0,4 0/0:.:.:3:40:0,3,783,3,783,783:40,0,0   0/0:.:.:84:5:84,90,428,0,337,328:2,0,3  0/0:.:.:0:24:0,0,364,0,364,364:24,0,0   0/0:.:.:0:26:0,0,483,0,483,483:26,0,0   0/1:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:1|0:54:4:78,0,54,84,60,144:2,2,0 0/0:.:.:18:34:0,18,270,18,270,270:34,0,0    0/0:.:.:0:35:0,0,655,0,655,655:35,0,0   0/0:.:.:15:29:0,15,225,15,225,225:29,0,0    0/0:.:.:12:28:0,12,180,12,180,180:28,0,0    0/0:.:.:0:32:0,0,527,0,527,527:32,0,0   0/0:.:.:9:25:0,9,135,9,135,135:25,0,0   0/0:.:.:0:16:0,0,223,0,223,223:16,0,0   0/0:.:.:39:2:39,42,109,0,67,64:1,0,1    0/0:.:.:1:4:0,1,129,9,132,140:3,1,0 0/0:.:.:0:40:0,0,743,0,743,743:40,0,0   0/0:.:.:0:27:0,0,534,0,534,534:27,0,0   0/0:.:.:0:35:0,0,497,0,497,497:35,0,0   0/0:10403_ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC_A:1|1:1:2:42,43,46,1,3,0:1,0,1 0/0:.:.:0:26:0,0,292,0,292,292:26,0,0   0/0:.:.:0:23:0,0,225,0,225,225:23,0,0   0/0:.:.:0:34:0,0,651,0,651,651:34,0,0   0/0:.:.:15:24:0,15,225,15,225,225:24,0,0    0/0:.:.:99:7:142,150,294,0,144,131:3,0,4    0/0:.:.:0:26:0,0,321,0,321,321:26,0,0   0/0:.:.:0:33:0,0,468,0,468,468:33,0,0   0/0:.:.:99:5:131,134,268,0,134,121:1,0,4    0/0:.:.:12:24:0,12,180,12,180,180:24,0,0    0/0:.:.:21:24:0,21,315,21,315,315:24,0,0    0/0:.:.:0:21:0,0,322,0,322,322:21,0,0   0/0:.:.:0:30:0,0,572,0,572,572:30,0,0
    
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Sakhaa

    The error message suggests that your vcf file is malformed. How was this vcf generated? Unfortunately there’s not much we can do for you if your files have formatting issues that seem to have been introduced by a third party program.

  • SakhaaSakhaa Member
    edited November 5

    Thank you @bhanuGandham

    I have used the best practice workflow for germline. I have generated 2 vcf files as cohorts, I want to merge them in one file, so I used another tool to marge the files.

    May I ask you which tool from gatk4 can be useful to marge to cohorts in one VCF file?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Sakhaa

    Hmmm, if you generated gvcf files then you could use the tools CombineGVCFs or GenomicsDBImport. On the other hand, if you want to combine vcf files from different cohorts, we do not have a tool for that purpose in GATK4. You could however use CombineVariants from GATK3 for this, but use it at your own risk. I say this only because we do not support GATK3 anymore but many users have found CombineVariants very helpful. Here is a link to its tool docs: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

Sign In or Register to comment.