The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

picard liftovervcf parsing error

cmccabecmccabe ChicagoMember
edited January 9 in Ask the GATK team

I am using picard liftover vcf to algin variants from hg38 to hg19. I am not sure what the error is referring to though it seems to be an issue with the vcf (I have copied a few lines of the format). Thank you :).

  java -jar /home/cmccabe/Desktop/NGS/picard-tools-1.140/picard.jar LiftoverVcf \
  I=/home/cmccabe/Desktop/align/out.vcf \
  O=/home/cmccabe/Desktop/tvc/IDP.lifted_over.vcf \
  CHAIN=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/hg38ToHg19.over.chain \
  REJECT=/home/cmccabe/Desktop/tvc/IDP_rejected_variants.vcf \
  R=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta

java version on ubuntu 14.04
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

trace error
[Mon Jan 09 08:04:50 CST 2017] picard.vcf.LiftoverVcf INPUT=/home/cmccabe/Desktop/tvc/out.vcf OUTPUT=/home/cmccabe/Desktop/tvc/IDP.lifted_over.vcf CHAIN=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/hg38ToHg19.over.chain REJECT=/home/cmccabe/Desktop/tvc/IDP_rejected_variants.vcf REFERENCE_SEQUENCE=/home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Jan 09 08:04:50 CST 2017] Executing as cmccabe@DTV-A5211QLM on Linux 4.4.0-57-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14; Picard version: 1.140(a81bc82e781dae05c922d1dbcee737334612399f_1444244284) IntelDeflater
INFO 2017-01-09 08:04:51 LiftoverVcf Loading up the target reference genome.
INFO 2017-01-09 08:05:00 LiftoverVcf Lifting variants over and sorting.
[Mon Jan 09 08:05:00 CST 2017] picard.vcf.LiftoverVcf done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=6771703808
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.IllegalStateException: Key CS found in VariantContext field INFO at chr1:11082610 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:176)
at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:115)
at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:206)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

VCF format: (few lines total 370)

fileformat=VCFv4.1

fileDate=20130610

source=ensembl;version=75;url=http://e75.ensembl.org/homo_sapiens

reference=ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/

INFO=<ID=TSA,Number=0,Type=String,Description="Type of sequence alteration. Child of term sequence_alteration as defined by the sequence ontology project.">

INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">

INFO=<ID=E_MO,Number=0,Type=Flag,Description="Multiple_observations.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status">

INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status">

INFO=<ID=E_HM,Number=0,Type=Flag,Description="HapMap.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status">

INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status">

INFO=<ID=E_C,Number=0,Type=Flag,Description="Cited.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status">

INFO=<ID=dbSNP_137,Number=0,Type=Flag,Description="Variants (including SNPs and indels) imported from dbSNP">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Individual

chr1 11082610 rs11689432 G A . . TSA=SNV;E_MO;E_Freq;E_HM;E_C;CS=pathogenic;dbSNP_137 GT 1/0
chr1 11107061 rs77977199 T G . . TSA=SNV;E_Freq;dbSNP_137 GT 1/0
chr1 17380507 rs11203289 G C . . TSA=SNV;E_MO;E_Freq;E_1000G;CS=pathogenic;dbSNP_137 GT 1/1

Post edited by cmccabe on

Issue · Github
by shlee

Issue Number
1620
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @cmccabe,

    I've been out sick and will get to your question soon.

  • shleeshlee CambridgeMember, Broadie, Moderator

    @cmccabe When you run ValidateVariants on your VCF, what is the result?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    The important bit of the error message is this, which is actually the internal validation result:

    Exception in thread "main" java.lang.IllegalStateException: Key CS found in VariantContext field INFO at chr1:11082610 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

    This means there's an annotation in the VCF that is not explicitly defined in the header. This is allowed by the VCF specification but we consider that a bad idea and disallow it in our software. You can fix this using a Picard tool called FixVcfHeader. The tool isn't yet listed on the Picard docs website but you can see usage info here.

  • JaredJared Member
    edited July 18

    Now that FixVcfHeader is in the documentation, I've attempted to run it to fix a malformed header. However, it fails with the error "Exception in thread "main" java.lang.IllegalArgumentException: A reference dictionary is required for creating Tribble indices on the fly". In the documentation, there is no place for a reference dictionary, and running it with a reference sequence doesn't solve the problem. Below is what I ran, and attached is the output.

    java -jar /media/gilchrist/Expansion/Marshall_Lab_Data/PICARD/picard.jar FixVcfHeader \
            INPUT=/media/gilchrist/Expansion/Marshall_Lab_Data/Reference_Variants/anopheles_gambiae.vcf \
            OUTPUT=/media/gilchrist/Expansion/Marshall_Lab_Data/Reference_Variants/anopheles_gambiae_fixed.vcf
    
    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie, Moderator
    edited July 18

    Hi @Jared,
    If you look at Picard BedToIntervalList documentation, we clarify what is meant by sequence dictionary. I think this applies across the board for Picard tools. So try adding to the command the SEQUENCE_DICTIONARY option as a separate argument, or be sure that your VCF has a header with #contig lines.

  • JaredJared Member

    @shlee
    FixVcfHeader doesn't take the SEQUENCE_DICTIONARY option.

    Missing ##contig in the header was the problem! Thank you.

  • shleeshlee CambridgeMember, Broadie, Moderator

    Glad we solved that. Thanks for reporting back the solution.

  • JaredJared Member
    edited July 19

    Back to the SEQUENCE_DICTIONARY issue, a workaround is to output a zipped file. This avoids the error statement and the lack of ##contig issue. See below.

    export mPath=/media/gilchrist/Expansion/Marshall_Lab_Data
    java -jar $mPath/PICARD/picard.jar FixVcfHeader \
            INPUT=$mPath/Reference_Variants/original.vcf \
            OUTPUT=$mPath/Reference_Variants/fixed.vcf.gz
    

    Reference:
    https://github.com/broadinstitute/picard/issues/274

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @Jared,

    When you specify .vcf.gz outputs over .vcf in Picard, tools forego certain validation checks, e.g. sequence dictionary checks, that it normally does and so you do not get an error. So yes this is also a workaround.

Sign In or Register to comment.