To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

[ERROR] SelectVariants

cbaocbao Member, Broadie

Hi,

I want to run SelectVariants on FC using my own vcf. But, I got the following ERROR. Any suggestions are welcome. Thanks!

Best,
Chunyang

22:08:56.550 INFO SelectVariants - Initializing engine 22:08:57.487 INFO FeatureManager - Using codec VCFCodec to read file file:///cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz 22:08:58.200 INFO SelectVariants - Shutting down engine [February 4, 2018 10:08:58 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 2.55 minutes. Runtime.totalMemory()=74297344 htsjdk.tribble.TribbleException: Contig 1 does not have a length field. at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80) at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206) at org.broadinstitute.hellbender.engine.FeatureDataSource.getSequenceDictionary(FeatureDataSource.java:400) at org.broadinstitute.hellbender.engine.FeatureManager.lambda$getAllSequenceDictionaries$5(FeatureManager.java:282) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.engine.FeatureManager.getAllSequenceDictionaries(FeatureManager.java:284) at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:588) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:563) at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277) Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx2g -jar /root/gatk.jar SelectVariants -V /cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz -O selected.vcf --lenient

Tagged:

Best Answers

  • cbaocbao Member, Broadie
    Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • cbaocbao Member, Broadie
    Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @cbao
    Hi,

    I think this is a VCF index issue. Can you try deleting the index and regenerating it? This thread may help too.

    -Sheila

  • cbaocbao Member, Broadie

    Thank you! I will try it out.

  • cbaocbao Member, Broadie

    I tried. But, it does work ... I got the same ERROR, "Contig 1 does not have a length field."

    My cmd:
    gatk --java-options "-Xmx${command_mem}g" SelectVariants -V ${input_vcf} -O ${output_name}.vcf.gz

    input_vcf is an uncompressed plain text.

  • cbaocbao Member, Broadie
    Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • cbaocbao Member, Broadie
    Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Sign In or Register to comment.