SAMException of Query asks for data past end of contig occured in mutect2

Dear team

Currently, I have downloaed the gatk4 (version 4.0.7.0) dockers and the "gatk4-data-processing-master" wdl coupled with the "gatk4-somatic-snvs-indels-master" wdl for somatic mutation detection. The followed command is utilized for the whole workflow with human genome reference data (version hg38);

(1) the command of "java -jar cromwell-34.jar run processing-for-variant-discovery-gatk4.wdl --inputs normal.json"
and "java -jar cromwell-34.jar run processing-for-variant-discovery-gatk4.wdl --inputs tissue.json" is called to generate bam and bai file for both cancer normal and tissue samples.
this step is successfully finished and bam and bai files are provided.

(2) the command "java -jar cromwell-34.jar run mutect2.wdl --inputs mutect2.json" is called for somatica mutation detection with normal and tissue bam and bai files as input.
unfortunately, the error of "htsjdk.samtools.SAMException: Query asks for data past end of contig" are occured in many contig of chromosome (for example, Query contig chrX start:224664443 stop:224664487 contigLength:156040895)

Someone can help me to fixed these errors, thanks a lot.

Here are those json files used in the issue

(a) the normal json file
{
"##_COMMENT1": "SAMPLE NAME AND UNMAPPED BAMS",
"PreProcessingForVariantDiscovery_GATK4.sample_name": "mytestN",
"PreProcessingForVariantDiscovery_GATK4.ref_name": "hg38",
"PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams_list": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/normal_u
bam_list.txt",
"PreProcessingForVariantDiscovery_GATK4.unmapped_bam_suffix": ".bam",

"##COMMENT2": "REFERENCE FILES",
"PreProcessingForVariantDiscovery_GATK4.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens

assembly38.dict",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.fasta",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.fasta.fai",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.alt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/
hg38/Homo_sapiens_assembly38.fasta.64.sa",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.amb",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.bwt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.ann",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.pac",

"##_COMMENT3": "KNOWN SITES RESOURCES",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.dbsnp138.sort.vcf",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.dbsnp138.sort.vcf.idx",
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_VCFs": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_indices": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],

"##_COMMENT4": "MISC PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.7.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_sort": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_fix": "-Xms500m",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.java_opt": "-Xms2000m",

"##_COMMENT8": "MEMORY ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.GetBwaVersion.mem_size": "1 GB",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.mem_size": "14 GB",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.mem_size": "7 GB",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.mem_size": "5000 MB",
"PreProcessingForVariantDiscovery_GATK4.CreateSequenceGroupingTSV.mem_size": "2 GB",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.mem_size": "6 GB",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.mem_size": "3 GB",

"##_COMMENT9": "DISK SIZE ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.agg_small_disk": 200,
"PreProcessingForVariantDiscovery_GATK4.agg_medium_disk": 300,
"PreProcessingForVariantDiscovery_GATK4.agg_large_disk": 400,
"PreProcessingForVariantDiscovery_GATK4.flowcell_small_disk": 100,
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
}

(b) the tissue json file
{
"##_COMMENT1": "SAMPLE NAME AND UNMAPPED BAMS",
"PreProcessingForVariantDiscovery_GATK4.sample_name": "mytestT",
"PreProcessingForVariantDiscovery_GATK4.ref_name": "hg38",
"PreProcessingForVariantDiscovery_GATK4.flowcell_unmapped_bams_list": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/tissue_u
bam_list.txt",
"PreProcessingForVariantDiscovery_GATK4.unmapped_bam_suffix": ".bam",

"##COMMENT2": "REFERENCE FILES",
"PreProcessingForVariantDiscovery_GATK4.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens

assembly38.dict",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.fasta",
"PreProcessingForVariantDiscovery_GATK4.ref_fasta_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.fasta.fai",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_alt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.alt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_sa": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/
hg38/Homo_sapiens_assembly38.fasta.64.sa",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_amb": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.amb",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_bwt": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.bwt",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_ann": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.ann",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.ref_pac": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data
/hg38/Homo_sapiens_assembly38.fasta.64.pac",

"##_COMMENT3": "KNOWN SITES RESOURCES",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens
_assembly38.dbsnp138.sort.vcf",
"PreProcessingForVariantDiscovery_GATK4.dbSNP_vcf_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_s
apiens_assembly38.dbsnp138.sort.vcf.idx",
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_VCFs": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz"
],
"PreProcessingForVariantDiscovery_GATK4.known_indels_sites_indices": [
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
],

"##_COMMENT4": "MISC PARAMETERS",
"PreProcessingForVariantDiscovery_GATK4.bwa_commandline": "bwa mem -K 100000000 -p -v 3 -t 16 -Y $bash_ref_fasta",
"PreProcessingForVariantDiscovery_GATK4.compression_level": 5,
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.num_cpu": "16",

"##_COMMENT5": "DOCKERS",
"PreProcessingForVariantDiscovery_GATK4.gotc_docker": "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786",
"PreProcessingForVariantDiscovery_GATK4.gatk_docker": "broadinstitute/gatk:4.0.7.0",
"PreProcessingForVariantDiscovery_GATK4.python_docker": "python:2.7",

"##_COMMENT6": "PATHS",
"PreProcessingForVariantDiscovery_GATK4.gotc_path": "/usr/gitc/",
"PreProcessingForVariantDiscovery_GATK4.gatk_path": "/gatk/gatk",

"##_COMMENT7": "JAVA OPTIONS",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_sort": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.java_opt_fix": "-Xms500m",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.java_opt": "-Xms4000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.java_opt": "-Xms3000m",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.java_opt": "-Xms2000m",

"##_COMMENT8": "MEMORY ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.GetBwaVersion.mem_size": "1 GB",
"PreProcessingForVariantDiscovery_GATK4.SamToFastqAndBwaMem.mem_size": "14 GB",
"PreProcessingForVariantDiscovery_GATK4.MergeBamAlignment.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.MarkDuplicates.mem_size": "7 GB",
"PreProcessingForVariantDiscovery_GATK4.SortAndFixTags.mem_size": "5000 MB",
"PreProcessingForVariantDiscovery_GATK4.CreateSequenceGroupingTSV.mem_size": "2 GB",
"PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.mem_size": "6 GB",
"PreProcessingForVariantDiscovery_GATK4.GatherBqsrReports.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.ApplyBQSR.mem_size": "3500 MB",
"PreProcessingForVariantDiscovery_GATK4.GatherBamFiles.mem_size": "3 GB",

"##_COMMENT9": "DISK SIZE ALLOCATION",
"PreProcessingForVariantDiscovery_GATK4.agg_small_disk": 200,
"PreProcessingForVariantDiscovery_GATK4.agg_medium_disk": 300,
"PreProcessingForVariantDiscovery_GATK4.agg_large_disk": 400,
"PreProcessingForVariantDiscovery_GATK4.flowcell_small_disk": 100,
"PreProcessingForVariantDiscovery_GATK4.flowcell_medium_disk": 200,

"##_COMMENT10": "PREEMPTIBLES",
"PreProcessingForVariantDiscovery_GATK4.preemptible_tries": 3,
"PreProcessingForVariantDiscovery_GATK4.agg_preemptible_tries": 3
}

(c) the mutect2 json file
{
"##_COMMENT1": "Runtime",
"##Mutect2.oncotator_docker": "(optional) String?",
"Mutect2.gatk_docker": "broadinstitute/gatk:4.0.7.0",

"##_COMMENT2": "Workflow options",
"##_Mutect2.intervals": "gs://gatk-best-practices/somatic-b37/whole_exome_agilent_1.1_refseq_plus_3_boosters.Homo_sapiens_assembly19.baits.
interval_list",
"Mutect2.scatter_count": 50,
"Mutect2.artifact_modes": ["G/T", "C/T"],
"##_Mutect2.m2_extra_args": "(optional) String?",
"##_Mutect2.m2_extra_filtering_args": "(optional) String?",
"Mutect2.run_orientation_bias_filter": "False",
"Mutect2.run_oncotator": "False",

"##_COMMENT3": "Primary inputs",
"Mutect2.ref_fasta": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.fasta",
"Mutect2.ref_dict": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.dict",
"Mutect2.ref_fai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/Homo_sapiens_assembly38.fasta.fai",
"Mutect2.normal_bam": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestN.hg38.bam",
"Mutect2.normal_bai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestN.hg38.bai",
"Mutect2.tumor_bam": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestT.hg38.bam",
"Mutect2.tumor_bai": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/test7/mytestT.hg38.bai",

"##COMMENT4": "Primary resources",
"##_Mutect2.pon": "(optional) File?",
"##_Mutect2.pon_index": "(optional) File?",
"Mutect2.gnomad": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/af-only-gnomad.hg38.vcf.gz",
"Mutect2.gnomad_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/af-only-gnomad.hg38.vcf.gz.tbi",
"Mutect2.variants_for_contamination": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/small_exac_common

3.hg38.vcf.gz",
"Mutect2.variants_for_contamination_index": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/small_exac_c
ommon_3.hg38.vcf.gz.tbi",
"##Mutect2.realignment_index_bundle": "File? (optional)",

"##_COMMENT5": "Secondary resources",
"Mutect2.onco_ds_tar_gz": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/oncotator_v1_ds_April052016.ta
r.gz",
"Mutect2.default_config_file": "/bdp-picb/bioinfo/gyzheng/GATK_pipeline/GATK_resource/reference_data/hg38/somatic/onco_config.txt",
"##_Mutect2.sequencing_center": "(optional) String?",
"##_Mutect2.sequence_source": "(optional) String?",

"##_COMMENT6": "Secondary resources",
"##_Mutect2.MergeBamOuts.mem": "(optional) Int?",
"##_Mutect2.SplitIntervals.mem": "(optional) Int?",
"##_Mutect2.M2.mem": "(optional) Int?",
"##_Mutect2.MergeVCFs.mem": "(optional) Int?",
"##_Mutect2.oncotate_m2.mem": "(optional) Int?",

"##_COMMENT7": "Secondary resources",
"##_Mutect2.onco_ds_local_db_dir": "(optional) String?",
"##_Mutect2.sequencing_center": "(optional) String?",
"##_Mutect2.oncotate_m2.oncotator_exe": "(optional) String?",
"##_Mutect2.gatk4_override": "(optional) File?",
"##_Mutect2.CollectSequencingArtifactMetrics.mem": "(optional) Int?",

"##_COMMENT8": "Disk space",
"##_Mutect2.MergeVCFs.disk_space_gb": "(optional) Int?",
"##_Mutect2.Filter.disk_space_gb": "(optional) Int?",
"##_Mutect2.M2.disk_space_gb": "(optional) Int?",
"##_Mutect2.M2.disk_space_gb": 100,
"##_Mutect2.oncotate_m2.disk_space_gb": "(optional) Int?",
"##_Mutect2.SplitIntervals.disk_space_gb": "(optional) Int?",
"##_Mutect2.MergeBamOuts.disk_space_gb": "(optional) Int?",
"##_Mutect2.CollectSequencingArtifactMetrics.disk_space_gb": "(optional) Int?",
"##_Mutect2.emergency_extra_disk": "(optional) Int?",

"##_COMMENT9": "Preemptibles",
"##_Mutect2.MergeBamOuts.preemptible_attempts": "(optional) Int?",
"Mutect2.preemptible_attempts": 3
}

Answers

  • gyzhenggyzheng Member

    Here is the log file of whole workflow, any suggestion is appreciated, Thanks.

    log.txt 846.2K
  • kyoungwookyoungwoo seoulMember

    In my case,

    the error message is like below,
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig 14 start:107373781 stop:107373828 contigLength:107349540

    and when I changed input fastq files, same error occured
    at other chromosomes such as 5, 17.

    need help for this.

  • kyoungwookyoungwoo seoulMember

    oh, and I'm using gatk-4.0.9

  • gyzhenggyzheng Member

    In my case, these error message occured in many contig of chromosome. the error information is followed.
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr8 start:170934295 stop:170934412 contigLength:145138636
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr7 start:238659150 stop:238659185 contigLength:159345973
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chrX start:181792155 stop:181792237 contigLength:156040895
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr2 start:261583080 stop:261583145 contigLength:242193529
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr1 start:251546140 stop:251546210 contigLength:248956422
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr6 start:185040350 stop:185040399 contigLength:170805979
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr3 start:342254538 stop:342254567 contigLength:198295559
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr12 start:169398154 stop:169398215 contigLength:133275309
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr22 start:62387619 stop:62387676 contigLength:50818468
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr7 start:203394589 stop:203394620 contigLength:159345973
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr9 start:145842401 stop:145842474 contigLength:138394717
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr4 start:207440304 stop:207440347 contigLength:190214555
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr9 start:144259547 stop:144259576 contigLength:138394717
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr10 start:135021503 stop:135021560 contigLength:133797422
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr19 start:128409653 stop:128409700 contigLength:58617616
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr5 start:261027383 stop:261027432 contigLength:181538259
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr16 start:158173608 stop:158173663 contigLength:90338345
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr8 start:283528826 stop:283528869 contigLength:145138636
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr6 start:180199269 stop:180199348 contigLength:170805979
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr17 start:84558964 stop:84559054 contigLength:83257441
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr18 start:91128297 stop:91128342 contigLength:80373285
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr3 start:198397545 stop:198397652 contigLength:198295559
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr11 start:209108225 stop:209108313 contigLength:135086622
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr17 start:99375088 stop:99375205 contigLength:83257441
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr11 start:220744338 stop:220744386 contigLength:135086622
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr3 start:250879934 stop:250879988 contigLength:198295559
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr5 start:226049587 stop:226049712 contigLength:181538259
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr15 start:145857986 stop:145858045 contigLength:101991189
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr1 start:367202858 stop:367202890 contigLength:248956422
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr13 start:134485185 stop:134485237 contigLength:114364328
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr20 start:98134871 stop:98134946 contigLength:64444167
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr4 start:309829209 stop:309829253 contigLength:190214555
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr2 start:245325358 stop:245325403 contigLength:242193529
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr22 start:56926369 stop:56926414 contigLength:50818468
    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chr12 start:199420185 stop:199420236 contigLength:133275309

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited October 2018

    Hi @gyzheng and @kyoungwoo,

    For those who may follow-up, I've pinpointed the origin of the error to the following code:
    https://github.com/broadinstitute/gatk/blob/4.0.10.1/src/main/java/org/broadinstitute/hellbender/utils/fasta/CachingIndexedFastaSequenceFile.java#L315-L316.

    The error appears to relate to the dictionary and a mismatch in information against the data as defined by

    SAMSequenceRecord contigInfo = sequenceFile.getSequenceDictionary().getSequence(contig);

    from https://github.com/broadinstitute/gatk/blob/4.0.10.1/src/main/java/org/broadinstitute/hellbender/utils/fasta/CachingIndexedFastaSequenceFile.java#L310.

    Taking one of @gyzheng's exceptions:

    htsjdk.samtools.SAMException: Query asks for data past end of contig. Query contig chrX start:224664443 stop:224664487 contigLength:156040895

    and matching it against our expectations for chrX length:

    @SQ SN:chrX LN:156040895    M5:2b3a55ff7f58eb308420c8a9b11cac50 AS:38   UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
    

    we see that the expected contig length against the dictionary is correct but something is amiss with the start and stop coordinates. The coordinates 224,664,443 and 224,664,487 go well past existing coordinates which end at 156,040,895.

    We need to figure out where these nonexistant coordinates are coming from. For starters, can you double-check your scattered intervals lists to see that they are reasonable and give existing coordinates? For example, for @gyzheng please see what is in -L /cromwell-executions/Mutect2/ec34099b-7f45-4a33-b205-0cd6011f53f0/call-M2/shard-46/inputs/1547804351/0046-scattered.intervals and whether the list is calling only existing intervals.

    P.S. If you can please, can you also confirm you get the same error with the latest release 4.0.10.1? If there will be fixes to the code, they will be based off of the latest release.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @gyzheng and @kyoungwoo. I checked with the developers and this is a known bug that was fixed in 4.0.10.0+. So updating to the latest release should fix this issue. Apologies for the inconvenience.

  • kyoungwookyoungwoo seoulMember

    @shlee I had confirmed problem was solved at gatk-4.0.10.1. thx!

  • gyzhenggyzheng Member

    Hi shlee, thanks for your nice help, we will update the GATK4 version to 4.0.10.

  • ChunjieChunjie Member

    @shlee Latest GATK4 solved the error. Thanks.

Sign In or Register to comment.