We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Help with FastaAlternateReferenceMaker for converting VCF to fasta

I am having some trouble running the FastaAlternateReferenceMaker tool to convert my vcf sequences to fasta using a reference genome. I started with a multi-sequence vcf made from whole genome paired-end Illumina data. I then subset the larger vcf file to isolate a single gene region and further subset it to only include organisms from one population. I was able to troubleshoot several issues but there seems to be something I am missing. I am no longer getting a clear error message as I was before, the message now is mostly incomprehensible except for one line which says:

"htsjdk.tribble.TribbleException: Contig CAE1 does not have a length field.
at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80)
at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206)

Has anyone encountered a similar issue?




  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @SMGAL_Chris

    Please send us the version of GATK you are using, the exact command and the entire error log.

  • Send it to whom? I could post it here but it's very long.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    You could attach the error log file in this text box.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @SMGAL_Chris

    We have not heard from you in 2 business days and so we are not closing this issue. Please write to us if you have any other questions.

  • annashipannaship Member
    Hi, I am having what looks like the same problem, so I'm posting here instead of making a new query. Attached is the full log, but in brief, my commands are:

    gatk FastaAlternateReferenceMaker \
    -R c_elegans.PRJNA13758.WS263.genomic.fa \
    -O pkc3_FARM1.txt \
    -L II:5380027-5384169 \
    -V WI.20180527.impute.vcf

    My versions:

    The Genome Analysis Toolkit (GATK) v4.1.2.0
    HTSJDK Version: 2.19.0
    Picard Version: 2.19.0

    Thanks for any help you can offer!
  • bshifawbshifaw Member, Broadie, Moderator admin


    GATK is built for model organisms and may not work well with other organisms. You can pose your question in the Zoo and Garden community forum to get help from other users running on non-model organisms.

    You can try validating your VCF with ValidateVariants to make sure there's nothing wrong with the VCF. Also try recreating the index file for your input as suggested in this forum thread

  • annashipannaship Member
    edited June 2019
    Okay thanks @bshifaw for your help. I think I got it working. I did try the ValidateVariants command, but that threw the identical error. The critical issue seemed to be:

    htsjdk.tribble.TribbleException: Contig I does not have a length field

    And indeed my vcf header not give any lengths for the chromosomes. So I manually edited in lengths I got from the reference .fai index.

    Then I used IndexFeatureFile to make an index for the vcf. With that in the directory FastaAlternateReferenceMaker then worked.

    By the way, this is C. elegans data. Did you mean GATK is built for human data and that model systems like worms, flies, mice often need extra troubleshooting? Or are the standard model systems well supported? (I often run into problems like this but I have only used GATK for flies and worms.)
  • bshifawbshifaw Member, Broadie, Moderator admin

    Yes, i meant humans also happy you got it to work. Thanks for sharing your solution.


Sign In or Register to comment.