Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

GenotypeGVCFs won't start running, instead stalls for hours without exiting or producing an error me

Young_WhaYoung_Wha Monsanto Company St LouisMember

hi there,

I am trying to use GenotypeGVCFs on a sample of just five maize individuals.(version 3.4) The programs stalls here and stays there for hours without exiting or starting:

INFO 03:10:45,593 HelpFormatter - ---------------------------------------------------------------------------------
INFO 03:10:45,596 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
INFO 03:10:45,596 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 03:10:45,596 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 03:10:45,600 HelpFormatter - Program Args: -T GenotypeGVCFs -R /home/ywlee1/scratch_yw/reference_genomes/GATK_Zea.fasta -L Zm_B73_CR10 -V 01DKD2_CR10.g.vcf -V 80IDM2_CR10.g.vcf -V B73_CR10.g.vcf -V COJO528_CR10.g.vcf -V FBLL_CR10.g.vcf --disable_auto_index_creation_and_locking_when_reading_rods -o test_new_ref.vcf
INFO 03:10:45,605 HelpFormatter - Executing as [email protected] on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_75-b13.
INFO 03:10:45,605 HelpFormatter - Date/Time: 2016/01/24 03:10:45
INFO 03:10:45,605 HelpFormatter - ---------------------------------------------------------------------------------
INFO 03:10:45,606 HelpFormatter - ---------------------------------------------------------------------------------
INFO 03:10:47,653 GenomeAnalysisEngine - Strictness is SILENT
INFO 03:10:48,454 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000

I'm not sure what's happening? I tried the same command line on a sample of four soy lines and it ran in 20 minutes. The command line is below:

java -Xmx10g -jar /home/ywlee1/bin/apps/GATK3/GenomeAnalysisTK.jar -T GenotypeGVCFs -R /home/ywlee1/scratch_yw/reference_genomes/X.fasta -V E.g.vcf -V D.g.vcf -V C.g.vcf -V B.g.vcf -V A.g.vcf -allSites -o Zm.CR10.GATK3.vcf

Because there aren't any error messages I'm not sure where to start troubleshooting. Any advice would be greatly appreciated!

thanks,
Young Wha

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Young_Wha
    Hi Young Wha,

    So, there is no progress meter after the last line you posted (COVERAGE: 1000)? Can you please try with the latest version and see if that shows a progress meter?

    Thanks,
    Sheila

    P.S. I am assuming you created the input GVCFs with HaplotypeCaller?

  • Young_WhaYoung_Wha Monsanto Company St LouisMember

    hi Sheila,

    1. yes, I used GVCFs.

    2. with 3.4 the progress meter appeared after 6 hours, like so:

    INFO 23:48:55,983 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 23:48:55,985 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 23:48:55,985 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 23:48:55,985 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 23:48:55,989 HelpFormatter - Program Args: -T GenotypeGVCFs -L Zm_B73_CR10 -R /home/ywlee1/scratch_yw/reference_genomes/Zea_mays_B73_Monsanto_2.fasta -V 01DKD2_CR10.g.vcf -V 80IDM2_CR10.g.vcf -V B73_CR10.g.vcf -V COJO528_CR10.g.vcf -V FBLL_CR10.g.vcf -allSites -o Zm.CR10.GATK3.withL.vcf
    INFO 23:48:55,992 HelpFormatter - Executing as [email protected] on Linux 2.6.32-504.16.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_75-b13.
    INFO 23:48:55,992 HelpFormatter - Date/Time: 2016/01/23 23:48:55
    INFO 23:48:55,992 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 23:48:55,992 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 23:48:56,467 GenomeAnalysisEngine - Strictness is SILENT
    INFO 23:48:59,377 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 05:07:30,193 IntervalUtils - Processing 149858468 bp from intervals
    INFO 05:07:32,895 GenomeAnalysisEngine - Preparing for traversal
    INFO 05:07:32,897 GenomeAnalysisEngine - Done preparing for traversal
    INFO 05:07:32,898 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 05:07:32,898 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 05:07:32,899 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 05:08:02,905 ProgressMeter - Starting 0.0 30.0 s 49.6 w 100.0% 30.0 s 0.0 s
    INFO 05:08:32,907 ProgressMeter - Starting 0.0 60.0 s 99.2 w 100.0% 60.0 s 0.0 s
    INFO 05:09:02,908 ProgressMeter - Starting 0.0 90.0 s 148.8 w 100.0% 90.0 s 0.0 s
    INFO 05:09:32,910 ProgressMeter - Starting 0.0 120.0 s 198.4 w 100.0% 120.0 s 0.0 s

    and then the run finished in about two hours.

    The funny thing is that with a soy run, there was no such drag.

    any ideas why this might be happening?

    thanks,
    YW

  • Young_WhaYoung_Wha Monsanto Company St LouisMember

    hi Sheila,

    I think I might have figured it out! Our reference genome has many small contigs in it in addition to the main chromosomes - more than 200000. I deleted those small contigs from the header of the gvcf (I was looking to genotype only one chromosome). In this case the GenotypeGVCFs began processing sites after only about 40 minutes of preparation.

    Does this sound like a plausible explanation for the previous behavior where GATK stalled for hours before starting to genotype?

    thanks,
    Young Wha

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That makes a whole lot of sense, actually. The way GATK loads configs is not very efficient, so if you have more than a few hundred, some preliminary operations really slow to a crawl. In your case it would absolutely struggle to deal with so many configs.

Sign In or Register to comment.