We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Persistent memory problem with BaseRecalibrator.

I'm using BaseRecalibrator as part of the GATK best practices workflow. My workflow up to this point is trim --> align --> deduplicate --> fix mates.

I'm calling BaseRecalibrator like this (edited for clarity):

java -Xmx16G -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R GATK_resource/b37/human_g1k_v37.fasta \
-I ${i} \
-knownSites GATK_resource/b37/dbsnp_138.b37.vcf.gz \
-knownSites GATK_resource/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz \
-knownSites GATK_resource/b37/1000G_phase3_v4_20130502.sites.vcf.gz
\ -L targets.interval.list -o ${NAME}_recal_data.table

My data is paired-end WES data produced by an Illumina HiSeq. I have 5 samples, each of which gave the same result.

Each time, BaseRecalibrator runs for about 20 minutes before quitting with the following error:

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.
ERROR ------------------------------------------------------------------------------------------

Here's what I've tried.
1. The error seems clear enough, so I've doubled the memory (-Xmx32G), with the result of getting the same error. So I doubled it again (64G)... and finally one more time (128G). Our server has 500G of RAM. Same error every time.
2. Checked the input bam files with ValidateSamFile. No errors in the bam files.
3. Updated GATK to version 3.8. Same error for every file.
4. Run a smaller interval, namely just chromosome 20 (-L 20 argument). Same error.
5. Tried to recalibrate an older WES bam from our lab, which was aligned to hg19. This worked, so I continued the whole BQSR workflow, with success all the way to a vcf.
6. Ran that same file through my whole workflow, from trimming to BQSR. No problems along the way.
7. Tried to recalibrate an older (smaller) targeted capture PE dataset, aligned to b37. It worked.

Any ideas? This has taken 3 weeks and I'm sort of at a loss where to turn next.

Issue · Github
by Sheila

Issue Number
2601
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • I should mention that my bams range in size from 5GB to 26GB, with most of them being 9GB. There is one read group per file.

    Here is a sample bam header:

    @HD VN:1.5  SO:coordinate
    @SQ SN:1    LN:249250621
    @SQ SN:2    LN:243199373
    @SQ SN:3    LN:198022430
    @SQ SN:4    LN:191154276
    @SQ SN:5    LN:180915260
    @SQ SN:6    LN:171115067
    @SQ SN:7    LN:159138663
    @SQ SN:8    LN:146364022
    @SQ SN:9    LN:141213431
    @SQ SN:10   LN:135534747
    @SQ SN:11   LN:135006516
    @SQ SN:12   LN:133851895
    @SQ SN:13   LN:115169878
    @SQ SN:14   LN:107349540
    @SQ SN:15   LN:102531392
    @SQ SN:16   LN:90354753
    @SQ SN:17   LN:81195210
    @SQ SN:18   LN:78077248
    @SQ SN:19   LN:59128983
    @SQ SN:20   LN:63025520
    @SQ SN:21   LN:48129895
    @SQ SN:22   LN:51304566
    @SQ SN:X    LN:155270560
    @SQ SN:Y    LN:59373566
    @SQ SN:MT   LN:16569
    @SQ SN:GL000207.1   LN:4262
    @SQ SN:GL000226.1   LN:15008
    @SQ SN:GL000229.1   LN:19913
    @SQ SN:GL000231.1   LN:27386
    @SQ SN:GL000210.1   LN:27682
    @SQ SN:GL000239.1   LN:33824
    @SQ SN:GL000235.1   LN:34474
    @SQ SN:GL000201.1   LN:36148
    @SQ SN:GL000247.1   LN:36422
    @SQ SN:GL000245.1   LN:36651
    @SQ SN:GL000197.1   LN:37175
    @SQ SN:GL000203.1   LN:37498
    @SQ SN:GL000246.1   LN:38154
    @SQ SN:GL000249.1   LN:38502
    @SQ SN:GL000196.1   LN:38914
    @SQ SN:GL000248.1   LN:39786
    @SQ SN:GL000244.1   LN:39929
    @SQ SN:GL000238.1   LN:39939
    @SQ SN:GL000202.1   LN:40103
    @SQ SN:GL000234.1   LN:40531
    @SQ SN:GL000232.1   LN:40652
    @SQ SN:GL000206.1   LN:41001
    @SQ SN:GL000240.1   LN:41933
    @SQ SN:GL000236.1   LN:41934
    @SQ SN:GL000241.1   LN:42152
    @SQ SN:GL000243.1   LN:43341
    @SQ SN:GL000242.1   LN:43523
    @SQ SN:GL000230.1   LN:43691
    @SQ SN:GL000237.1   LN:45867
    @SQ SN:GL000233.1   LN:45941
    @SQ SN:GL000204.1   LN:81310
    @SQ SN:GL000198.1   LN:90085
    @SQ SN:GL000208.1   LN:92689
    @SQ SN:GL000191.1   LN:106433
    @SQ SN:GL000227.1   LN:128374
    @SQ SN:GL000228.1   LN:129120
    @SQ SN:GL000214.1   LN:137718
    @SQ SN:GL000221.1   LN:155397
    @SQ SN:GL000209.1   LN:159169
    @SQ SN:GL000218.1   LN:161147
    @SQ SN:GL000220.1   LN:161802
    @SQ SN:GL000213.1   LN:164239
    @SQ SN:GL000211.1   LN:166566
    @SQ SN:GL000199.1   LN:169874
    @SQ SN:GL000217.1   LN:172149
    @SQ SN:GL000216.1   LN:172294
    @SQ SN:GL000215.1   LN:172545
    @SQ SN:GL000205.1   LN:174588
    @SQ SN:GL000219.1   LN:179198
    @SQ SN:GL000224.1   LN:179693
    @SQ SN:GL000223.1   LN:180455
    @SQ SN:GL000195.1   LN:182896
    @SQ SN:GL000212.1   LN:186858
    @SQ SN:GL000222.1   LN:186861
    @SQ SN:GL000200.1   LN:187035
    @SQ SN:GL000193.1   LN:189789
    @SQ SN:GL000194.1   LN:191469
    @SQ SN:GL000225.1   LN:211173
    @SQ SN:GL000192.1   LN:547496
    @RG ID:Melanoma LB:Melarray SM:patient_4_blood  PL:ILLUMINA PU:HiSEQ4000
    @PG ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa mem -t 32 -M -R @RG\tID:Melanoma\tLB:Melarray\tSM:patient_4_blood\tPL:ILLUMINA\tPU:HiSEQ4000 /data/Phil/ref_phil/GATK_resource/b37/human_g1k_v37.fasta ./patient_4_blood-trimmed-pair1.fastq.gz ./patient_4_blood-trimmed-pair2.fastq.gz
    @PG ID:MarkDuplicates   VN:2.9.0-1-gf5b9f50-SNAPSHOT    CL:picard.sam.markduplicates.MarkDuplicates INPUT=[patient_4_blood_sorted.bam] OUTPUT=patient_4_blood_sorted_dedup.bam METRICS_FILE=patient_4_blood_metrics.txt    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json    PN:MarkDuplicates
    
  • Finally, here are my commands upstream of BaseRecalibrator.

    Alignment:

    bwa mem -t 32 -M -R "@RG\tID:Melanoma\tLB:Melarray\tSM:${SAMPLE}\tPL:ILLUMINA\tPU:HiSEQ4000" /data/Phil/ref_phil/GATK_resource/b37/human_g1k_v37.fasta ./${i} ./${SAMPLE}-trimmed-pair2.fastq.gz | java -Xmx16G -jar /data/Phil/software/picard-tools-2.9.0/picard.jar SortSam I=/dev/stdin O=${SAMPLE}_sorted.bam  CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT SO=coordinate
    

    De-duplication:

    java -Xmx16G -jar /data/Phil/software/picard-tools-2.9.0/picard.jar MarkDuplicates I=${i} O=${NAME}_sorted_dedup.bam  M=${NAME}_metrics.txt
    

    Fix mate information:

    java -Xmx16G -jar /data/Phil/software/picard-tools-2.9.0/picard.jar FixMateInformation VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true SORT_ORDER=coordinate I=${i} O=${NAME}_sorted_dedup_fixmate.bam
    
  • SheilaSheila Broad InstituteMember, Broadie admin

    @patrickturko
    Hi,

    Can you try on an even smaller interval? I see you tried -L 20, but can you try -L 20:1-1000000? Or, try on a different chromosome? I am wondering if this is an issue of extremely high depth in one region. I will check with the team for other ideas as well.

    -Sheila

  • Hi Sheila, thanks for the answer.

    I tried -L 20:1-1000000, as well as -L MT. Both of these worked fine.

    Your comment about the depth led me to run GATK's DepthOfCoverage for my exome intervals, and I found that about 1000 intervals have coverage over 400, with about half of those being above 500. The full summary is attached. Glancing (by eye) over the full coverage file, I see some regions with coverage of 700 to 900.

    Remember that this is already de-duplicated, the high coverage isn't from duplicated reads. The output from DepthOfCoverageshows that it successfully filtered 5% of reads that were marked as duplicates by picard MarkDuplicates.

    Is this too high for BaseRecalibrator? I wonder if the test of -L 20:1-1000000 worked because it just didn't happen to hit any of these very high coverage regions?

    Any advice would be very welcome.

  • Hi again,

    I tried downsampling to lower coverage during the recalibration, using the -dcov argument to BaseRecalibrator. I tried 400x, 20x, and 5x, all of which failed in the same was as the above tests.

  • More information. I tried removing all intervals over 500x coverage. Like this:

    DepthofCoverage produces a file called mybam.sample_interval_summary. From this file, I identified intervals with coverage >500, and printed those intervals to a file:

    $ awk '/>500/{print $1}' mybam.sample_interval_summary > big_intervals.list

    Then I ran BaseRecalibrator using the -XL big_intervals.list argument to exclude these intervals.

    They were indeed successfully excluded, as seen in the output:

    INFO 11:31:33,023 IntervalUtils - Excluding 248406 loci from original intervals (0.27% reduction)

    Buuuuuuuuuut the program failed in the same was as usual:

    ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument.
    ERROR ------------------------------------------------------------------------------------------

    This time I noticed that the program seemed to get hung up at a specific locus. You can see that in the output attached, where the Progress Meter stops at locus 2:88408060 and stays there for 10 minutes before quitting. I looked at this locus in IGV viewer and didn't see anything unusual.

    Unfortunately I discarded my output files from previous tests, so I'm not sure if this is a consistent finding.

  • Hi Sheila,

    I tried increasing memory to 64G while excluding those sites, with the same failure. Again the Progress meter sat at one locus for 10 minutes before quitting, and again this locus looked normal on IGV.

    Those excluded intervals were just the intervals with a mean coverage >500, but it occurred to me that there may be other, smaller sites where the coverage was too high. So I found and excluded those sites too, but again I had the same failure.

    Finally I tried GATK 4 beta 5, with success! It generated a recalibration report in about an hour. It worked with both the high coverage sites included and excluded.

    I've been doing these tests with my smallest BAM, but I'm running the GATK4 recalibration on all my files now. I'll let you know what happens.

  • Hi again, I have now successfully recalibrated 10 bams using GATK4, none of which worked with GATK 3.8. In my opinion my problem is solved. Thanks again.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @patrickturko
    Hi,

    Great news! Thank you for letting us know :smile:

    -Sheila

Sign In or Register to comment.