Having trouble running GenotypeGVCFs with around 50 samples

Hi. I may be making a simple mistake but I cannot figure out what is going on at all. I am currently trying to run GenotypeGVCFs on around 50 whole human genome samples. These samples were produced with haplotype caller. The interesting thing is earlier while I was waiting to finish calling of all my samples I ran it on some 20 samples with no problem. I may just be not allocating enough memory however it seems like GATK does not even begin to process the files. I am running on a LSF server in a docker container so I would normally think the problem could be on my end but it was running with some 20 samples. Should I have to Combinegvcfs because I am using whole genome sequences or possibly do you guys have a memory suggestion.

The only error I seem to get is this :

Traceback (most recent call last):
File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 598, in
main(sys.argv[1:])
File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 577, in main
ret = run_docker(argv)
File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 526, in run_docker
docker_image = os.environ['LSB_DOCKER_IMAGE']
File "/usr/lib/python2.7/UserDict.py", line 23, in getitem
raise KeyError(key)
KeyError: 'LSB_DOCKER_IMAGE

Yet it does run with a small sample size.

Best Answer

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @tphillip,

    Are you using one of our official Docker images? You can find GATK Dockers at

    The functionality offered by GATK3 CombineGVCFs is in GATK4 as GenomicsDBImport. If your errors persist using one of our Docker images, then please let us know.

  • tphilliptphillip Member
    edited July 2017

    I was using my own container as I thought it was in good practice to get familiar with docker. Just ran it using official 3.7-0 tagged container and got what appears to be the same error:

    Traceback (most recent call last):
      File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 598, in <module>
        main(sys.argv[1:])
      File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 577, in main
        ret = run_docker(argv)
      File "/opt/lsf9/9.1/linux2.6-glibc2.3-x86_64/bin/docker_run.py", line 526, in run_docker
        docker_image = os.environ['LSB_DOCKER_IMAGE']
      File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
        raise KeyError(key)
    KeyError: 'LSB_DOCKER_IMAGE'
    

    Command was :

    /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -Xmx60G -jar /usr/GenomeAnalysisTK.jar \
            -T GenotypeGVCFs \
            -R /gscuser/tphillip/RefFiles/GRCh37_gi/all_sequences.fa \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml107.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml106.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml104.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml102.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml101.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml100.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml98.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml97.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml95.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml91.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml90.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml83.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml81.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml80.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml7_.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml79.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml6_.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml68.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml67.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml66.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml65.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml64.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml63.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml61.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml5_.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml57.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml55.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml51.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml50.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml4_.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml49.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml48.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml46.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml44.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml43.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml42.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml41.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml40.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml3_.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml39.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml38.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml37.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml36.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml35.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml34.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml32.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml30.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml99.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml96.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml94.g.vcf" \
            -o "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCFGenotyper/GATK_107-30.vcf" 
    
  • tphilliptphillip Member
    edited July 2017

    My successful smaller run cmd was (ignore the ${FILEX} variables ... on the small sample I was using variables to pass in the file names which allowed for copy pasting) :

    /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -Xmx20G -jar /usr/GenomeAnalysisTK.jar \
            -T GenotypeGVCFs \
            -R /gscuser/tphillip/RefFiles/GRCh37_gi/all_sequences.fa \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE1}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE2}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE3}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE4}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE5}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE6}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE7}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE8}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE9}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE10}.g.vcf" \
            --variant "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml${FILE11}.g.vcf" \
            -o "/gscuser/tphillip/spencerlab/TestFiles/GATK_GVCFGenotyper/GATK_${AML_NUM}.vcf"
    

    Can see the successful output here :

    3.7-0: Pulling from broadinstitute/gatk3
    Digest: sha256:45cadfb0cebce00f8b6c242ac0a6c64ba868b5e3aea911529465c624f2797e79
    Status: Image is up to date for broadinstitute/gatk3:3.7-0
    
    << output from stderr >>
    INFO  03:47:16,221 HelpFormatter - --------------------------------------------------------------------------------
    INFO  03:47:16,224 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
    INFO  03:47:16,224 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO  03:47:16,224 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO  03:47:16,224 HelpFormatter - [Tue Jul 11 03:47:16 UTC 2017] Executing on Linux 3.19.0-73-generic amd64
    INFO  03:47:16,225 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_102-8u102-b14.1-1~bpo8+1-b14
    INFO  03:47:16,229 HelpFormatter - Program Args: -T GenotypeGVCFs -R /gscuser/tphillip/RefFiles/GRCh37_gi/all_sequences.fa --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml100.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml101.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml102.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml104.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml106.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml107.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml98.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml97.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml95.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml91.g.vcf --variant /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCF/GATK_aml90.g.vcf -o /gscuser/tphillip/spencerlab/TestFiles/GATK_GVCFGenotyper/GATK_aml100-84_testdtest.vcf
    INFO  03:47:16,236 HelpFormatter - Executing as [email protected] on Linux 3.19.0-73-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_102-8u102-b14.1-1~bpo8+1-b14.
    INFO  03:47:16,236 HelpFormatter - Date/Time: 2017/07/11 03:47:16
    INFO  03:47:16,236 HelpFormatter - --------------------------------------------------------------------------------
    INFO  03:47:16,237 HelpFormatter - --------------------------------------------------------------------------------
    INFO  03:47:17,166 GenomeAnalysisEngine - Strictness is SILENT
    INFO  03:47:17,362 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO  03:47:17,930 GenomeAnalysisEngine - Preparing for traversal
    INFO  03:47:17,940 GenomeAnalysisEngine - Done preparing for traversal
    INFO  03:47:17,941 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  03:47:17,942 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
    INFO  03:47:17,942 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
    WARN  03:47:18,118 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    WARN  03:47:18,120 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail.
    INFO  03:47:18,121 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
    WARN  03:47:18,602 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs
    WARN  03:47:25,765 ExactAFCalculator - This tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 1: 726944 has 9 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. Unless the DEBUG logging level is used, this warning message is output just once per run and further warnings are suppressed.
    INFO  03:47:47,946 ProgressMeter -       1:1999901   1000000.0    30.0 s      30.0 s        0.1%    12.9 h      12.9 h
    

    And it continues until success...

  • Thanks for going above and beyond. I definitely see how that could be it. Have a safe flight.

Sign In or Register to comment.