Bug? in HaplotypeCaller of GATK 3.4-46 with variant_index_type and without variant_index_paramet

ma_koma_ko JapanMember
edited August 2015 in Ask the GATK team

Dear GATK team,

I recently might find a potential bug that use up CPU resource in HaplotypeCaller (HC) of GATK 3.4-46.
There was no problem when I ran HC with variant_index_type and variant_index_parameter (log 1), or without variant_index_type and variant_index_parameter (log 2).
However, I added variant_index_type option only (log 3), HC used up CPU resouce in Linux (CentOS 6.5) and Mac OS X 10.10.3.
This was reproduced with different bam files.

This kind of problem happens rarely, because these option usage might not be expected.
Many thanks.

Masakazu

Log 1. with variant_index_type and variant_index_parameter
INFO 09:57:42,982 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:57:42,986 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 09:57:42,986 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 09:57:42,986 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 09:57:42,989 HelpFormatter - Program Args: -rf BadCigar -rf FailsVendorQualityCheck -rf MappingQualityUnavailable -T HaplotypeCaller -R human_g1k_v37_decoy.fasta -I DRR006760_aligned_reads_fixmate_dedup_realign_recal_sorted.bam --dbsnp dbsnp_138.b37.vcf --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 --maxReadsInRegionPerSample 2000 -L Broad.human.exome.b37.interval_list -o DRR006760_raw_variants.g.vcf INFO 09:57:46,808 HelpFormatter - Executing as [email protected] on Mac OS X 10.10.3 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15. INFO 09:57:46,808 HelpFormatter - Date/Time: 2015/08/14 09:57:42 INFO 09:57:46,808 HelpFormatter - --------------------------------------------------------------------------------- INFO 09:57:46,809 HelpFormatter - --------------------------------------------------------------------------------- WARN 09:57:46,823 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values for --variant_index_type and --variant_index_parameter INFO 09:57:47,170 GenomeAnalysisEngine - Strictness is SILENT INFO 09:57:47,241 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 09:57:47,247 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 09:57:47,274 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03 INFO 09:57:47,279 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 09:57:48,154 IntervalUtils - Processing 32950014 bp from intervals INFO 09:57:48,270 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 09:57:48,484 GenomeAnalysisEngine - Done preparing for traversal INFO 09:57:48,485 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 09:57:48,485 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 09:57:48,485 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 09:57:48,485 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output INFO 09:57:48,486 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output INFO 09:57:48,580 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units WARN 09:57:49,972 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING WARN 09:57:50,135 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper WARN 09:57:50,136 InbreedingCoeff - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line. INFO 09:58:18,497 ProgressMeter - 1:12854415 2.09412346E8 30.0 s 0.0 s 0.8% 59.1 m 58.6 m INFO 09:58:48,509 ProgressMeter - 1:12855585 2.09691411E8 60.0 s 0.0 s 0.8% 118.1 m 117.1 m

Log 2 without variant_index_type and variant_index_parameter
INFO 10:23:44,096 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:23:44,097 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 10:23:44,097 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 10:23:44,097 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 10:23:44,100 HelpFormatter - Program Args: -rf BadCigar -rf FailsVendorQualityCheck -rf MappingQualityUnavailable -T HaplotypeCaller -R human_g1k_v37_decoy.fasta -I DRR006760_aligned_reads_fixmate_dedup_realign_recal_sorted.bam --dbsnp dbsnp_138.b37.vcf --emitRefConfidence GVCF --maxReadsInRegionPerSample 2000 -L Broad.human.exome.b37.interval_list -o DRR006760_raw_variants.g.vcf INFO 10:23:47,890 HelpFormatter - Executing as [email protected] on Mac OS X 10.10.3 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15. INFO 10:23:47,891 HelpFormatter - Date/Time: 2015/08/14 10:23:44 INFO 10:23:47,891 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:23:47,891 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:23:48,222 GenomeAnalysisEngine - Strictness is SILENT INFO 10:23:48,288 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 10:23:48,294 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 10:23:48,319 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 10:23:48,324 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 10:23:49,204 IntervalUtils - Processing 32950014 bp from intervals INFO 10:23:49,262 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 10:23:49,499 GenomeAnalysisEngine - Done preparing for traversal INFO 10:23:49,500 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 10:23:49,500 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 10:23:49,500 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 10:23:49,500 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output INFO 10:23:49,501 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output INFO 10:23:49,581 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units WARN 10:23:51,030 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING WARN 10:23:51,197 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper WARN 10:23:51,197 InbreedingCoeff - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line. INFO 10:24:19,508 ProgressMeter - 1:12854415 2.09412346E8 30.0 s 0.0 s 0.8% 59.1 m 58.6 m INFO 10:24:49,520 ProgressMeter - 1:12855585 2.09691411E8 60.0 s 0.0 s 0.8% 118.1 m 117.1 m

Log 3 Running with "--variant_index_type LINEAR" only made CPU slow down
INFO 10:00:42,467 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:00:42,469 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 10:00:42,469 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 10:00:42,469 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 10:00:42,472 HelpFormatter - Program Args: -rf BadCigar -rf FailsVendorQualityCheck -rf MappingQualityUnavailable -T HaplotypeCaller -R human_g1k_v37_decoy.fasta -I DRR006760_aligned_reads_fixmate_dedup_realign_recal_sorted.bam --dbsnp dbsnp_138.b37.vcf --emitRefConfidence GVCF --variant_index_type LINEAR --maxReadsInRegionPerSample 2000 -L Broad.human.exome.b37.interval_list -o DRR006760_raw_variants.g.vcf INFO 10:00:46,269 HelpFormatter - Executing as [email protected] on Mac OS X 10.10.3 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15. INFO 10:00:46,269 HelpFormatter - Date/Time: 2015/08/14 10:00:42 INFO 10:00:46,269 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:00:46,269 HelpFormatter - --------------------------------------------------------------------------------- WARN 10:00:46,283 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate values for --variant_index_type and --variant_index_parameter INFO 10:00:46,580 GenomeAnalysisEngine - Strictness is SILENT INFO 10:00:46,645 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 10:00:46,650 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 10:00:46,673 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 10:00:46,678 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 10:00:47,450 IntervalUtils - Processing 32950014 bp from intervals INFO 10:00:47,508 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 10:00:47,743 GenomeAnalysisEngine - Done preparing for traversal INFO 10:00:47,744 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 10:00:47,744 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 10:00:47,744 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 10:00:47,746 HaplotypeCaller - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output INFO 10:00:47,746 HaplotypeCaller - All sites annotated with PLs forced to true for reference-model confidence output INFO 10:00:47,851 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units INFO 10:01:35,497 ProgressMeter - 1:69489 138.0 47.0 s 96.1 h 0.0% 4.8 w 4.8 w

Post edited by ma_ko on

Issue · Github
by Sheila

Issue Number
1136
State
closed
Last Updated
Closed By
vdauwera

Best Answer

Answers

Sign In or Register to comment.