Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK stalls on particular NCBI reference genome

GopoGopo Vetmeduni ViennaMember

Hi,
I am upgrading my analysis pipeline from GATK 3.3-0 to 3.6-0 on a CentOS 6.x machine and came across a problem with GATK stalling at the following point in the output below (I know that it is not a complete GATK command, but the command below does not even generate an error message:

[[email protected] sandbox]$ ~/bin/jdk1.8.0_111/bin/java -jar ~/bin/GATK-3.6/GenomeAnalysisTK.jar -T FindCoveredIntervals -R GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna

INFO 20:26:32,905 HelpFormatter - ----------------------------------------------------------------------------------
INFO 20:26:32,908 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 20:26:32,909 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 20:26:32,909 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 20:26:32,909 HelpFormatter - [Mon Nov 21 20:26:32 CST 2016] Executing on Linux 2.6.32-642.11.1.el6.x86_64 amd64
INFO 20:26:32,910 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 JdkDeflater
INFO 20:26:32,914 HelpFormatter - Program Args: -T FindCoveredIntervals -R GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna
INFO 20:26:32,919 HelpFormatter - Executing as [email protected] on Linux 2.6.32-642.11.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14.
INFO 20:26:32,919 HelpFormatter - Date/Time: 2016/11/21 20:26:32
INFO 20:26:32,919 HelpFormatter - ----------------------------------------------------------------------------------
INFO 20:26:32,920 HelpFormatter - ----------------------------------------------------------------------------------
INFO 20:26:32,928 GenomeAnalysisEngine - Strictness is SILENT

---it will sit at Strictness is Silent for hours

I thought maybe my reference genome, it's index, or dictionary might have been corrupted, so I reacquired the reference:

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000241765.3_Chrysemys_picta_bellii-3.0.3/GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna.gz
gunzip GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna.gz

made it's index with samtools-1.3.1
~/bin/samtools-1.3.1/samtools faidx GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna

made it's sequence dictionary with picard-tools-2.7.1
~/bin/jdk1.8.0_111/bin/java -jar ~/bin/picard-tools-2.7.1/picard.jar CreateSequenceDictionary R=GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna O=GCF_000241765.3_Chrysemys_picta_bellii-3.0.3_genomic.fna.dict

The problem is if I repeat the above steps with a different NCBI created reference genome, GATK at least doesn't stall and tells me to provide

ERROR MESSAGE: Walker requires reads but none were provided.

Best,
Gopo

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Gopo
    Hi Gopo,

    How large is the reference when there is no error and the tool stalls? What happens if you do give an input BAM file?

    Also, what kind of reference gives the correct error message?

    Thanks,
    Sheila

  • GopoGopo Vetmeduni ViennaMember

    How large is the reference when there is no error and the tool stalls?

    2365.77 mega bases

    What happens if you do give an input BAM file?

    The tool still stalls - I've tried other GATK tools too. There is no stalling if I use GATK 3.5-0.

    Also, what kind of reference gives the correct error message?

    So far I've only tried two other references that give the correct error message:
    (1) the example fasta from the GATK tar
    (2)ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/385/455/GCF_000385455.1_Zonotrichia_albicollis-1.0.1/GCF_000385455.1_Zonotrichia_albicollis-1.0.1_genomic.fna.gz

    Thanks,
    Gopo

  • GopoGopo Vetmeduni ViennaMember

    I tried the process with the new GATK-3.7 (presumably with the latest htsjdk) and still have the stalling problem

    Issue · Github
    by Geraldine_VdAuwera

    Issue Number
    1571
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    vdauwera
Sign In or Register to comment.