Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Help with FastaAlternateReferenceMaker for converting VCF to fasta

I am having some trouble running the FastaAlternateReferenceMaker tool to convert my vcf sequences to fasta using a reference genome. I started with a multi-sequence vcf made from whole genome paired-end Illumina data. I then subset the larger vcf file to isolate a single gene region and further subset it to only include organisms from one population. I was able to troubleshoot several issues but there seems to be something I am missing. I am no longer getting a clear error message as I was before, the message now is mostly incomprehensible except for one line which says:

"htsjdk.tribble.TribbleException: Contig CAE1 does not have a length field.
at htsjdk.variant.vcf.VCFContigHeaderLine.getSAMSequenceRecord(VCFContigHeaderLine.java:80)
at htsjdk.variant.vcf.VCFHeader.getSequenceDictionary(VCFHeader.java:206)
...“

Has anyone encountered a similar issue?

Best,

Christian

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @SMGAL_Chris

    Please send us the version of GATK you are using, the exact command and the entire error log.

  • Send it to whom? I could post it here but it's very long.
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @SMGAL_Chris

    You could attach the error log file in this text box.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @SMGAL_Chris

    We have not heard from you in 2 business days and so we are not closing this issue. Please write to us if you have any other questions.

  • annashipannaship Member
    Hi, I am having what looks like the same problem, so I'm posting here instead of making a new query. Attached is the full log, but in brief, my commands are:

    gatk FastaAlternateReferenceMaker \
    -R c_elegans.PRJNA13758.WS263.genomic.fa \
    -O pkc3_FARM1.txt \
    -L II:5380027-5384169 \
    -V WI.20180527.impute.vcf

    My versions:

    The Genome Analysis Toolkit (GATK) v4.1.2.0
    HTSJDK Version: 2.19.0
    Picard Version: 2.19.0

    Thanks for any help you can offer!
  • bshifawbshifaw Member, Broadie, Moderator admin

    @annaship

    GATK is built for model organisms and may not work well with other organisms. You can pose your question in the Zoo and Garden community forum to get help from other users running on non-model organisms.

    You can try validating your VCF with ValidateVariants to make sure there's nothing wrong with the VCF. Also try recreating the index file for your input as suggested in this forum thread

  • annashipannaship Member
    edited June 5
    Okay thanks @bshifaw for your help. I think I got it working. I did try the ValidateVariants command, but that threw the identical error. The critical issue seemed to be:

    htsjdk.tribble.TribbleException: Contig I does not have a length field

    And indeed my vcf header not give any lengths for the chromosomes. So I manually edited in lengths I got from the reference .fai index.

    Then I used IndexFeatureFile to make an index for the vcf. With that in the directory FastaAlternateReferenceMaker then worked.

    By the way, this is C. elegans data. Did you mean GATK is built for human data and that model systems like worms, flies, mice often need extra troubleshooting? Or are the standard model systems well supported? (I often run into problems like this but I have only used GATK for flies and worms.)
  • bshifawbshifaw Member, Broadie, Moderator admin

    Yes, i meant humans also happy you got it to work. Thanks for sharing your solution.

    -Beri

Sign In or Register to comment.