Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

[ERROR] SelectVariants

cbaocbao Member, Broadie ✭✭

Hi,

I want to run SelectVariants on FC using my own vcf. But, I got the following ERROR. Any suggestions are welcome. Thanks!

Best,
Chunyang

22:08:56.550 INFO SelectVariants - Initializing engine 22:08:57.487 INFO FeatureManager - Using codec VCFCodec to read file file:///cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz 22:08:58.200 INFO SelectVariants - Shutting down engine [February 4, 2018 10:08:58 PM UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 2.55 minutes. Runtime.totalMemory()=74297344 htsjdk.tribble.TribbleException: Contig 1 does not have a length field. at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80) at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206) at org.broadinstitute.hellbender.engine.FeatureDataSource.getSequenceDictionary(FeatureDataSource.java:400) at org.broadinstitute.hellbender.engine.FeatureManager.lambda$getAllSequenceDictionaries$5(FeatureManager.java:282) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.engine.FeatureManager.getAllSequenceDictionaries(FeatureManager.java:284) at org.broadinstitute.hellbender.engine.GATKTool.validateSequenceDictionaries(GATKTool.java:588) at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:563) at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277) Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx2g -jar /root/gatk.jar SelectVariants -V /cromwell_root/fc-b9cc3b48-69c9-4013-8eba-ff7973fc7bc9/gnomad.genomes.r2.0.2.sites.vcf.gz -O selected.vcf --lenient

Tagged:

Best Answers

  • cbaocbao ✭✭
    Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • cbaocbao ✭✭
    Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @cbao
    Hi,

    I think this is a VCF index issue. Can you try deleting the index and regenerating it? This thread may help too.

    -Sheila

  • cbaocbao Member, Broadie ✭✭

    Thank you! I will try it out.

  • cbaocbao Member, Broadie ✭✭

    I tried. But, it does work ... I got the same ERROR, "Contig 1 does not have a length field."

    My cmd:
    gatk --java-options "-Xmx${command_mem}g" SelectVariants -V ${input_vcf} -O ${output_name}.vcf.gz

    input_vcf is an uncompressed plain text.

  • cbaocbao Member, Broadie ✭✭
    Accepted Answer

    Thank you! I fixed the issue by myself. My VCF header is not correct.

  • cbaocbao Member, Broadie ✭✭
    Accepted Answer

    Hi, @Sheila,

    My VCF header is not in a "correct" format because there is no length field in the line start with "##contig". So, I just use awk to add contig lenght to my VCF header. Moreover, I think it also works if I use "-R ref.fasta" in a GATK cmd. Thanks a lot!

    contig_length=cat Homo_sapiens_assembly19.fasta.fai | \ grep ^[0-9X] | awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}'

    cat my.vcf | \
    grep '#' | grep -v '^##contig=' | \
    awk -v c="${contig_length}" '/^#CHROM/ {printf c"\n"} {print}' > new.vcf`

    More detials: https://www.biostars.org/p/198660/

    Thanks,
    Chunyang

Sign In or Register to comment.