Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenomicsDBImport intervals and haplotaypecaller estimated time GATK 4.1.0.0

Hello,
I'm trying to get VCF file following germline short variant discovery best practices.
I have ready for variant discovery bam files. I have 2 questions:

  1. I need to specify time limit for jobs i submit and also i'm short of time. I want to know how long does haplotypecaller usually takes to get output so i can give the job a good time limit.
    what are effective factors?
    my bam files are about 10GB each.

  2. GenomicsDBImport needs to get an interval list but i don't have interval list or exclude list and i'm doing whole genome analysis. what should i put for intervals since it's a required parameter and also estimated time to get output for this step if possible.

Thank you very much

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 19

    HI @Amirm

    1) an approx 3GB dataset takes 1.5hrs.
    2) You could provide all the chromosomes in the interval list. Take a look at this doc: https://software.broadinstitute.org/gatk/documentation/article?id=11813

  • AmirmAmirm Member
    edited April 19

    Hi @bhanuGandham
    Thanks for answering
    Haplotypecaller is still running after 4 days (one sample), anyway to find the problem?
    i have validated my BAM files too and no error was found.
    my samples have over 30000 scaffolds.i guess i should use interval list for WGS but i don't know how to make the list for whole genome.
    Thanks

  • AmirmAmirm Member

    I can't edit my last post anymore, I'm working on a non-human specie.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @Amirm

    Please share with me
    1) the exact Haplotypecaller command you are using,
    2) would you please share with us the sequence dictionary and contigs info
    3) What are the computing resources you are using for this command?
    4) What does the progress meter(which is in the stdout log) show?

  • AmirmAmirm Member
    edited April 23

    Hi @bhanuGandham

    1) This is the exact command i ran:
    gatk HaplotypeCaller --java-options "-Xmx8g -XX:ParallelGCThreads=1" -R $ref/$refrenceName -I $deduped_sorted/${sample}_deduped_sorted.bam -O $gvcf/${sample}_raw.g.vcf --emit-ref-confidence GVCF

    2) this is the sequence dictionary file

    3) i have 16GB of ram for each job i submit, i don't know anymore

    4) one of my jobs finished after 8 days for a BAM file that is about 4GB, output is a gvcf file with 14GB size this is the last lines of the log for the finished one (index file is 34MB):

        00:23:18.287 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.287 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:18.288 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        00:23:28.883 INFO  ProgressMeter -  scaffold1341:578166          10643.9               8122870            763.1
        00:23:29.994 INFO  HaplotypeCaller - 12015950 read(s) filtered by: ((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
          12015950 read(s) filtered by: (((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter)
              12015950 read(s) filtered by: ((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
                  12015950 read(s) filtered by: (((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
                      12015950 read(s) filtered by: ((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
                          3740867 read(s) filtered by: (((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
                              3378325 read(s) filtered by: ((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
                                  3378325 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
                                      3378325 read(s) filtered by: MappingQualityReadFilter 
                              362542 read(s) filtered by: NotSecondaryAlignmentReadFilter 
                          8275083 read(s) filtered by: NotDuplicateReadFilter
        00:23:29.994 INFO  ProgressMeter -  scaffold1341:579681          10644.0               8122886            763.1
        00:23:29.994 INFO  ProgressMeter - Traversal complete. Processed 8122886 total regions in 10644.0 minutes.
        00:23:30.989 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 40.349389684
        00:23:30.990 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 37802.598207419
        00:23:30.990 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 536843.66 sec
        00:23:30.990 INFO  HaplotypeCaller - Shutting down engine
        [April 23, 2019 12:23:31 AM PDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 10,644.09 minutes.
        Runtime.totalMemory()=8586264576
        Using GATK jar ....
        Running:
            java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8g -XX:ParallelGCThreads=1 -jar....
    

    and last lines of one of unfinished ones:

        12:32:13.038 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:13.038 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:13.038 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:13.038 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:18.011 INFO  ProgressMeter -  scaffold393:1827028          11372.8               6676260            587.0
        12:32:22.146 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:22.147 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:22.147 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:22.147 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:32:28.491 INFO  ProgressMeter -  scaffold393:1838866          11372.9               6676350            587.0
        12:32:39.065 INFO  ProgressMeter -  scaffold393:1852882          11373.1               6676460            587.0
        12:32:49.463 INFO  ProgressMeter -  scaffold393:1863845          11373.3               6676540            587.0
        12:32:59.649 INFO  ProgressMeter -  scaffold393:1874775          11373.5               6676630            587.0
        12:33:10.763 INFO  ProgressMeter -  scaffold393:1887484          11373.6               6676720            587.0
        12:33:21.222 INFO  ProgressMeter -  scaffold393:1902577          11373.8               6676860            587.0
        12:33:22.083 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:22.084 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:29.623 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:29.623 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:31.831 INFO  ProgressMeter -  scaffold393:1913406          11374.0               6676950            587.0
        12:33:42.557 INFO  ProgressMeter -  scaffold393:1924729          11374.2               6677050            587.0
        12:33:52.985 WARN  DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:52.985 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
        12:33:54.384 INFO  ProgressMeter -  scaffold393:1941125          11374.4               6677190            587.0
        12:34:04.817 INFO  ProgressMeter -  scaffold393:1952375          11374.5               6677280            587.0
        12:34:14.976 INFO  ProgressMeter -  scaffold393:1966428          11374.7               6677380            587.0
        12:34:25.469 INFO  ProgressMeter -  scaffold393:1978931          11374.9               6677470            587.0
        12:34:37.255 INFO  ProgressMeter -  scaffold393:1994097          11375.1               6677600            587.0
        12:34:47.466 INFO  ProgressMeter -  scaffold393:2004328          11375.2               6677680            587.0
        12:34:57.605 INFO  ProgressMeter -  scaffold393:2014027          11375.4               6677750            587.0
        12:35:07.911 INFO  ProgressMeter -  scaffold393:2029670          11375.6               6677880            587.0
        12:35:18.031 INFO  ProgressMeter -  scaffold393:2039334          11375.8               6677950            587.0
    

    Many thanks!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @Amirm

    This issue could be due to one of many reasons:
    1) Are the bams stored in network storage?
    2) Is it non-human data with high ploidy
    3) Do the bams have lots of messy, low coverage data?
    4) Lastly and more importantly, how much free physical memory are you leaving? Apart from the heap memory that you allocate with Xmx value, you should leave at least 2-3GB of physical memory for Java native libraries and OS.

Sign In or Register to comment.