If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

[ERROR] Malformed VCF: empty alleles are not permitted in VCF records

mattqdeanmattqdean CAMember
edited February 2016 in Ask the GATK team

I am running BaseRecalibrator for my RNA-seq:

java -jar -Xmx120g ${GATK} -T BaseRecalibrator -R "${reference}" -I "${file4}" -knownSites "${gerVar}" -knownSites "${somVar}" -o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator -R "${reference}" -I "${file4}" -knownSites "${gerVar}" -knownSites "${somVar}" -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" -o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates -R "${reference}" -before "${file4%_tstaids.bam}_tstaidsr.table1" -after "${file4%_tstaids.bam}_tstaidsr.table2" -plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads -R "{reference}" -I "${file4}" -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" -o "${file7}"

Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:

sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'

I received this error:

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records

I used the command below to inspect my VCF file (it is ${gerVar} that is malformed):

sed -n '18354680p'

which returned:

chr11 5249456 HbVar.633 G . . PhenCode_20140430;TSA=sequence_alteration;AA=A

Post edited by mattqdean on

Best Answer


  • girardotgirardot Heidelberg, GermanyMember

    Hi all,

    I have the same issue but it is a bit weird. I am using VCF from dbsnp for the fly (from yesterday) and latest GATK (3.7). They provide a vcd.gz together with .tbi index. If I use the cvs.gz together with its .tbi (colocated) it works just fine but if I remove the .tbi ; then I have the error. I can already ear you thinking "why the hell does he removes the .tbi", well this is because I initially tried to run GATK in Galaxy with the uncompressed file (without associated index) and got the same error.
    Do you have an idea why the presence of the index makes GATK considers the VCF valid ?


  • girardotgirardot Heidelberg, GermanyMember

    sorry for the bad autocorrection : vcd.gz and cvs.gz should read vcf.gz

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    All GATK tools absolutely require a VCF index along with the VCF. GATK will index uncompressed VCF files on-the-fly, but it will not index compressed VCF files. I am surprised you got an error running on an uncompressed VCF without an index. Perhaps it is a Galaxy issue?


  • splaisansplaisan Leuven (Belgium)Member ✭✭

    For whom this could help, I had the same from a fruitfly dbsnp data from NCBI

    vcf-validator reports:

    The column ALT is empty at 3R:19738372.
    The column ALT is empty at 3R:24346609.
    The column ALT is empty at 3R:28099302.
    The column ALT is empty at X:2782218.
    The column ALT is empty at X:10101813.
    The column ALT is empty at X:12757524.
    The column ALT is empty at X:18195320.

    I could correct the vcf with the following awk snippet

    gawk 'BEGIN{FS="\t"; OFS="\t"}{if (NF>1 && $5=="") {$5="."; print $0} else print $0}' fruitfly_7227.vcf > fruitfly_7227_corr.vcf

Sign In or Register to comment.