[ERROR] SelectVariants

cbaocbao Member, Broadie

Hi,

I want to run SelectVariants on FC using my own vcf. But, I got the following ERROR. Any suggestions are welcome. Thanks!

Best,
Chunyang

22:08:56.550 INFO SelectVariants - Initializing engine 22:08:57.487 INFO FeatureManager - Using codec VCFCodec to read file file:///cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz 22:08:58.200 INFO SelectVariants - Shutting down engine [February 4, 2018 10:08:58 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 2.55 minutes. Runtime.totalMemory()=74297344 htsjdk.tribble.TribbleException: Contig 1 does not have a length field. at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80) at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206) at org.broadinstitute.hellbender.engine.FeatureDataSource.getSequenceDictionary(FeatureDataSource.java:400) at org.broadinstitute.hellbender.engine.FeatureManager.lambda$getAllSequenceDictionaries$5(FeatureManager.java:282) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.engine.FeatureManager.getAllSequenceDictionaries(FeatureManager.java:284) at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:588) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:563) at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277) Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx2g -jar /root/gatk.jar SelectVariants -V /cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz -O selected.vcf --lenient

Tagged:

Best Answers

  • Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cbao
    Hi,

    I think this is a VCF index issue. Can you try deleting the index and regenerating it? This thread may help too.

    -Sheila

  • cbaocbao Member, Broadie

    Thank you! I will try it out.

  • cbaocbao Member, Broadie

    I tried. But, it does work ... I got the same ERROR, "Contig 1 does not have a length field."

    My cmd:
    gatk --java-options "-Xmx${command_mem}g" SelectVariants -V ${input_vcf} -O ${output_name}.vcf.gz

    input_vcf is an uncompressed plain text.

  • cbaocbao Member, Broadie
    Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • cbaocbao Member, Broadie
    Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Sign In or Register to comment.