To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

[ERROR] Malformed VCF: empty alleles are not permitted in VCF records

mattqdeanmattqdean CAMember
edited February 2016 in Ask the GATK team

I am running BaseRecalibrator for my RNA-seq:

java -jar -Xmx120g ${GATK} -T BaseRecalibrator -R "${reference}" -I "${file4}" -knownSites "${gerVar}" -knownSites "${somVar}" -o "${file4%_tstaids.bam}_tstaidsr.table1"
java -jar -Xmx120g ${GATK} -T BaseRecalibrator -R "${reference}" -I "${file4}" -knownSites "${gerVar}" -knownSites "${somVar}" -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" -o "${file4%_tstaids.bam}_tstaidsr.table2"
java -jar -Xmx120g ${GATK} -T AnalyzeCovariates -R "${reference}" -before "${file4%_tstaids.bam}_tstaidsr.table1" -after "${file4%_tstaids.bam}_tstaidsr.table2" -plots "${file1%_tsta.bam}_BQSR.pdf"
java -jar -Xmx120g ${GATK} -T PrintReads -R "{reference}" -I "${file4}" -BQSR "${file4%_tstaids.bam}_tstaidsr.table1" -o "${file7}"

Note that I got 2 variant VCF from Ensembl (germline and somatic). My reference is Ensembl GRCh38.p5. I ran the command below to append 'chr' notation and change chrMT to chrM:

sed -e '/^[^#]/s/^/chr/' -e 's/^chrMT/chrM/'

I received this error:

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 18354680: empty alleles are not permitted in VCF records

I used the command below to inspect my VCF file (it is ${gerVar} that is malformed):

sed -n '18354680p'

which returned:

chr11 5249456 HbVar.633 G . . PhenCode_20140430;TSA=sequence_alteration;AA=A

Post edited by mattqdean on

Best Answer

Answers

  • girardotgirardot Heidelberg, GermanyMember

    Hi all,

    I have the same issue but it is a bit weird. I am using VCF from dbsnp for the fly (from yesterday) and latest GATK (3.7). They provide a vcd.gz together with .tbi index. If I use the cvs.gz together with its .tbi (colocated) it works just fine but if I remove the .tbi ; then I have the error. I can already ear you thinking "why the hell does he removes the .tbi", well this is because I initially tried to run GATK in Galaxy with the uncompressed file (without associated index) and got the same error.
    Do you have an idea why the presence of the index makes GATK considers the VCF valid ?
    thx

    Charles

  • girardotgirardot Heidelberg, GermanyMember

    sorry for the bad autocorrection : vcd.gz and cvs.gz should read vcf.gz

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @girardot
    Hi,

    All GATK tools absolutely require a VCF index along with the VCF. GATK will index uncompressed VCF files on-the-fly, but it will not index compressed VCF files. I am surprised you got an error running on an uncompressed VCF without an index. Perhaps it is a Galaxy issue?

    -Sheila

Sign In or Register to comment.