Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Annotated intervals do not match provided intervals | CreateReadCountPanelOfNormals

Hello all,

I'm using WDL for somatic CNV, I start with cnv_somatic_panel_workflow.wdl which basically includes 4 tasks/functions:

CNVTasks.PreprocessIntervals (Done)
CNVTasks.AnnotateIntervals (Done)
CNVTasks.CollectCounts (Done)
CreateReadCountPanelOfNormals (Error)

The first 3 works well, but the problem/error is from the last one in which it shows me this error:

```
java.lang.IllegalArgumentException: Annotated intervals do not match provided intervals.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:135)
at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/10/08 18:23:25 INFO ShutdownHookManager: Shutdown hook called
19/10/08 18:23:25 INFO ShutdownHookManager: Deleting directory /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/tmp.16de278a/spark-8a26fda4-9b30-49b0-b795-846caa9a1e35
Using GATK jar /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar defined in environment variable GATK_LOCAL_JAR
```

The initial interval file been taken from here https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0 as suggested here https://gatkforums.broadinstitute.org/gatk/discussion/10215/intervals-and-interval-lists

Any help how to solve this please?

Best Answer

Answers

  • sarawaslsarawasl Member
    @asmirnov Yes you're totally right, I realize that mine is Whole Exome Sequencing WES; while what I used is for Whole Genome Sequencing WGS.

    But as I checked, there is no interval file for exome data; Are there any?
    Then, what can I use for that?
  • asmirnovasmirnov BroadMember, Broadie, Dev ✭✭

    @sarawasl Usually you would need to obtain a list of targets that is specific to your capture kit design and make interval list out of that. You could try looking in the BAM header for that information.

    If you don't have that, I can offer you this generic exome list (attached). However, it's possible it will miss some regions that your capture kits has. If you go this route I recommend you use FilterIntervals tool that we have to get rid of zero covered regions.

  • sarawaslsarawasl Member
    @asmirnov Unfortunately, the BAM files I worked with are from TCGA repository in which I contact them if it's possible to provide me interval files but they said: "we cannot share interval files from commercial kits on the active portal".

    So then, I will go with the one you provided here and I will use FilterIntervals as you suggest.

    I will get back to you after re-try again.

    Many thanks for your help.
  • sarawaslsarawasl Member
    @asmirnov Hi again,

    When I tried FilterIntervals, I cannot since it requires --annotated-intervals annotated_intervals.tsv in which I don't have it (or from where I can get it?)

    So, I decide to proceed directly and run the pipeline again with your provided interval file but it gives me this error (when it running this function PreprocessIntervals):

    Interval file could not be parsed in any supported format.
    interval_list has an invalid interval : 1:68993-70105 + .

    Any idea or suggestions please.
  • sleeslee Member, Broadie, Dev ✭✭✭
    edited October 12

    @sarawasl It might benefit you to review the notes from the somatic CNV panel workflow WDL (https://github.com/broadinstitute/gatk/blob/master/scripts/cnv_wdl/somatic/cnv_somatic_panel_workflow.wdl):

    • The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists). These intervals will be padded on both sides by the amount specified by padding (default 250) and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning, e.g., for WES). For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be included, but care should be taken to 1) avoid creating panels of mixed sex, and 2) denoise case samples only with panels containing only individuals of the same sex as the case samples).

    • Intervals can be blacklisted from coverage collection and all downstream steps by using the blacklist_intervals argument, which accepts formats compatible with the GATK -XL argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists). This may be useful for excluding centromeric regions, etc. from analysis. Alternatively, these regions may be manually filtered from the final callset.

    The documentation for PreprocessIntervals (https://software.broadinstitute.org/gatk/documentation/tooldocs/4.1.4.0/org_broadinstitute_hellbender_tools_copynumber_PreprocessIntervals.php) may also be helpful, as well as the tutorial at https://software.broadinstitute.org/gatk/documentation/article?id=11682.

    The purpose of the PreprocessIntervals tool is to create the bins for coverage collection. Typically, for WES, this is done by providing the target intervals via -L, enabling padding, and disabling binning by specifying --bin-length 0. The result is bins of unequal length given by the padded targets. (This is in contrast to what is done for WGS, where we typically specify the autosomes via -L, disable padding, and specify the desired bin length, resulting in bins of equal length that cover the autosomes.)

    In either case, the resulting bins are then provided via -L to both the CollectReadCounts and AnnotateIntervals tool (since we are collecting counts and annotating GC content in these bins). Specifying different sets of intervals to these tools may ultimately result in the error message you saw---might this have been the case?

    If you do not have access to the target intervals, then the exon list that @asmirnov provided will probably suffice. Since CreateReadCountPanelOfNormals performs its own filtering steps, you may not need to use FilterIntervals (which is used in the germline CNV workflow) as he suggested.

    I'm still a bit confused as to how you encountered this error if you are just running the WDL. Confusion about WES vs. WGS aside, I'm not sure this should happen if there was a reference mismatch between the intervals and the BAMs. Can you provide some more details, such as the JSON you used to run the WDL?

    Post edited by slee on
  • sarawaslsarawasl Member
    Thanks for this amazing detailed response @slee

    Please forget about mismatch error, in the second response I clarify that it was my mistake of using the WGS interval file for my WES samples.

    So, when I got the new correct interval from @asmirnov and try to the CreateReadCountPanelOfNormals WDL again I got this error that I mentioned in my last response:

    interval file could not be parsed in any supported format.
    interval_list has an invalid interval : 1:68993-70105 + .

    And this is my provided .json file (as I run it with only one file; for easy tracking the errors then if it works well, I would apply it for the whole samples):

    {
    "CNVSomaticPanelWorkflow.normal_bams": ["/cancer/f40ba854-81bd-4f64-bd3d-72e113d6c04a/0991191801648da74a240847e8803df5_gdc_realn.bam"],
    "CNVSomaticPanelWorkflow.normal_bais": ["/cancer/f40ba854-81bd-4f64-bd3d-72e113d6c04a/0991191801648da74a240847e8803df5_gdc_realn.bai"],
    "CNVSomaticPanelWorkflow.ref_fasta_dict": "/projects/c2014/genomics/reference/hg38.dict",
    "CNVSomaticPanelWorkflow.gatk4_jar_override": "/sw/csi/gatk/4.1.2.0/el7.5_binary/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar",
    "CNVSomaticPanelWorkflow.pon_entity_id": "weg-pon",
    "CNVSomaticPanelWorkflow.ref_fasta_fai": "/projects/c2014/genomics/reference/hg38.fa.fai",
    "CNVSomaticPanelWorkflow.gatk_docker": "broadinstitute/gatk:latest",
    "CNVSomaticPanelWorkflow.ref_fasta": "/projects/c2014/genomics/reference/hg38.fa",
    "CNVSomaticPanelWorkflow.intervals": "/cancer/germline_resources_GATK_CNV_Germline_Validation_intervals.interval_list"
    }
  • sleeslee Member, Broadie, Dev ✭✭✭
    edited October 12

    @sarawasl from that JSON it looks like you are using the hg38 reference, while the intervals that @asmirnov provided you are for hg19.

    Note also that you should set bin-length and padding as discussed.

    Please forget about mismatch error, in the second response I clarify that it was my mistake of using the WGS interval file for my WES samples.

    I understand that you used the wrong interval file. My concern is that the specific error that you got doesn't seem like it should've resulted from that action; I would expect that the workflow would have completed successfully, but that the results would not have been useful (as they would have used the WGS calling regions to create bins). So I just want to make sure there isn't some unexpected bug in the workflow (and if not, perhaps we can fail earlier with a more informative message).

  • sarawaslsarawasl Member
    @slee Oh okay I don't know that was hg19, yes mine is hg38.

    If that's possible, could you please provide me the interval file for hg38 for WES.

    Once it been provided, I will add bin-length and padding.

    Yeah as I said before; trying the previous interval file (for WGS) for my WES bam file with this WDL workflow gives me the error of mismatch within CreateReadCountPanelOfNormals task (Annotated intervals do not match provided intervals.)
  • sleeslee Member, Broadie, Dev ✭✭✭

    @sarawasl Actually, apologies, i think I now see how you might've encountered that error with a reference mismatch. It's possible that one of the tools threw a warning, rather than failed; see https://github.com/broadinstitute/gatk/pull/4758 for context. Perhaps you can check your logs and let us know if this is the case?

    Are your BAMs hg19 or hg38? Also, what is the naming convention of your contigs (e.g., chr1 vs. 1)? See https://software.broadinstitute.org/gatk/documentation/article?id=11010. You should ensure that all BAMs, reference files, and interval lists have the same dictionary. In any case, I would expect that you can use the hg19 exon list that @asmirnov provided and simply lift it over or modify the naming convention as needed.

  • sarawaslsarawasl Member

    @slee I checked all the logs files of each of the 4 tasks/functions, all work well without any warnings or errors except the last one (CreateReadCountPanelOfNormals) which gives the error of mismatch:

    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-Creat
    eReadCountPanelOfNormals/tmp.aa0aa9b9
    02:46:38.855 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be
    set or the GCS hadoop connector will not be configured properly
    02:46:38.931 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cromwell-executions/CNVSomaticPanelWorkflow/c9f4
    d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/n
    ative/libgkl_compression.so
    Oct 12, 2019 2:46:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    02:46:40.654 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
    02:46:40.655 INFO CreateReadCountPanelOfNormals - The Genome Analysis Toolkit (GATK) v4.1.2.0
    02:46:40.655 INFO CreateReadCountPanelOfNormals - For support and documentation go to https://software.broadinstitute.org/gatk/
    02:46:40.655 INFO CreateReadCountPanelOfNormals - Executing as [email protected] on Linux v3.10.0-957.12.1.el7.x86_64 amd64
    02:46:40.656 INFO CreateReadCountPanelOfNormals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
    02:46:40.656 INFO CreateReadCountPanelOfNormals - Start Date/Time: October 12, 2019 2:46:38 AM UTC
    02:46:40.656 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
    02:46:40.656 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
    02:46:40.657 INFO CreateReadCountPanelOfNormals - HTSJDK Version: 2.19.0
    02:46:40.658 INFO CreateReadCountPanelOfNormals - Picard Version: 2.19.0
    02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    02:46:40.658 INFO CreateReadCountPanelOfNormals - Deflater: IntelDeflater
    02:46:40.658 INFO CreateReadCountPanelOfNormals - Inflater: IntelInflater
    02:46:40.658 INFO CreateReadCountPanelOfNormals - GCS max retries/reopens: 20
    02:46:40.659 INFO CreateReadCountPanelOfNormals - Requester pays: disabled
    02:46:40.659 INFO CreateReadCountPanelOfNormals - Initializing engine
    02:46:40.659 INFO CreateReadCountPanelOfNormals - Done initializing engine
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    19/10/12 02:46:40 INFO SparkContext: Running Spark version 2.2.0
    19/10/12 02:46:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where ap
    plicable
    19/10/12 02:46:41 INFO SparkContext: Submitted application: CreateReadCountPanelOfNormals
    19/10/12 02:46:41 INFO SecurityManager: Changing view acls to: althubsw
    19/10/12 02:46:41 INFO SecurityManager: Changing modify acls to: althubsw
    19/10/12 02:46:41 INFO SecurityManager: Changing view acls groups to:
    19/10/12 02:46:41 INFO SecurityManager: Changing modify acls groups to:
    19/10/12 02:46:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(althubsw); groups with view permissions: Set(); users with modify permissions: Set(althubsw); groups with modify permissions: Set()
    19/10/12 02:46:41 INFO Utils: Successfully started service 'sparkDriver' on port 36605.
    19/10/12 02:46:41 INFO SparkEnv: Registering MapOutputTracker
    19/10/12 02:46:41 INFO SparkEnv: Registering BlockManagerMaster
    19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    19/10/12 02:46:41 INFO DiskBlockManager: Created local directory at /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/tmp.aa0aa9b9/blockmgr-9e29af49-d400-43cb-a267-274961663f6d
    19/10/12 02:46:41 INFO MemoryStore: MemoryStore started with capacity 3.2 GB
    19/10/12 02:46:41 INFO SparkEnv: Registering OutputCommitCoordinator
    19/10/12 02:46:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    19/10/12 02:46:41 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.109.201.8:4040
    19/10/12 02:46:41 INFO Executor: Starting executor ID driver on host localhost
    19/10/12 02:46:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38943.
    19/10/12 02:46:41 INFO NettyBlockTransferService: Server created on 10.109.201.8:38943
    19/10/12 02:46:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    19/10/12 02:46:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.109.201.8, 38943, None)
    19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: Registering block manager 10.109.201.8:38943 with 3.2 GB RAM, BlockManagerId(driver, 10.109.201.8, 38943, None)
    19/10/12 02:46:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.109.201.8, 38943, None)
    19/10/12 02:46:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.109.201.8, 38943, None)
    02:46:41.811 INFO CreateReadCountPanelOfNormals - Spark verbosity set to INFO (see --spark-verbosity argument)
    19/10/12 02:46:41 INFO HDF5Library: Trying to load HDF5 library from:
    jar:file:/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar!/org/broadinstitute/hdf5/libjhdf5.2.11.0.so
    19/10/12 02:46:41 INFO H5: HDF5 library:
    19/10/12 02:46:41 INFO H5: successfully loaded.
    02:46:41.893 WARN CreateReadCountPanelOfNormals - Number of eigensamples (20) is greater than the number of input samples (1); the number of samples retained after filtering will be used instead.
    02:46:41.902 INFO CreateReadCountPanelOfNormals - Retrieving intervals from first read-counts file (/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/970456118/0991191801648da74a240847e8803df5_gdc_realn.counts.hdf5)...
    02:46:42.013 INFO CreateReadCountPanelOfNormals - Reading and validating annotated intervals...
    02:46:42.058 WARN CreateReadCountPanelOfNormals - Sequence dictionary in annotated-intervals file does not match the master sequence dictionary.
    19/10/12 02:46:42 INFO SparkUI: Stopped Spark web UI at http://10.109.201.8:4040
    19/10/12 02:46:42 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    19/10/12 02:46:42 INFO MemoryStore: MemoryStore cleared
    19/10/12 02:46:42 INFO BlockManager: BlockManager stopped
    19/10/12 02:46:42 INFO BlockManagerMaster: BlockManagerMaster stopped
    19/10/12 02:46:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    19/10/12 02:46:42 INFO SparkContext: Successfully stopped SparkContext
    02:46:42.107 INFO CreateReadCountPanelOfNormals - Shutting down engine
    [October 12, 2019 2:46:42 AM UTC] org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals done. Elapsed time: 0.05 minutes.
    Runtime.totalMemory()=2167930880
    java.lang.IllegalArgumentException: Annotated intervals do not match provided intervals.
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
    at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:135)
    at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    19/10/12 02:46:42 INFO ShutdownHookManager: Shutdown hook called
    19/10/12 02:46:42 INFO ShutdownHookManager: Deleting directory /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/tmp.aa0aa9b9/spark-5ab8ff4c-e99a-4371-9ffb-1bfa3530c835
    Using GATK jar /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar defined in environment variable GATK_LOCAL_JAR
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6500m -jar /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar CreateReadCountPanelOfNormals --input /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/970456118/0991191801648da74a240847e8803df5_gdc_realn.counts.hdf5 --minimum-interval-median-percentile 10.0 --maximum-zeros-in-sample-percentage 5.0 --maximum-zeros-in-interval-percentage 5.0 --extreme-sample-median-percentile 2.5 --do-impute-zeros true --extreme-outlier-truncation-percentile 0.1 --number-of-eigensamples 20 --maximum-chunk-size 16777216 --annotated-intervals /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1667627880/intervals-wgs_calling_regions.hg38.preprocessed.annotated.tsv --output weg-pon.pon.hdf5

    All my BAM files are hg38.

    When I checked the naming convention for my BAMs, reference file, and the interval file (provided by @asmirnov); my BAMs and reference file ~> chrN but the interval file (provided by @asmirnov) ~> just the number N (without chr) and with that it stops as we discussed before at the first task (PreprocessIntervals) with the error of:

    interval file could not be parsed in any supported format.
    interval_list has an invalid interval : 1:68993-70105 + .

    But regarding the previous interval file (relate to WGS), it's the same as mine BAMs and reference files (chrN) and stops at the last task (CreateReadCountPanelOfNormals) also as we see before with the error of mismatch.

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited October 12

    Thanks @sarawasl. Indeed, I think you should be able to proceed with your analysis if you lift over the hg19 exon interval list that @asmirnov provided to hg38. The "interval file could not be parsed in any supported format" message emitted by PreprocessIntervals is expected behavior, since you are trying to use hg19 dictionary + intervals with an hg38 reference. Let us know if properly lifting over the interval list (using e.g. LiftOverIntervalList) does not resolve things.

    Thanks also for your help in continuing to diagnose the possible reference mismatch from your original issue. It might be helpful to see the hg38 dictionaries from your BAM and reference just to make sure there aren't any subtle mismatches there. It would also be helpful to see the logs from AnnotateIntervals and CollectReadCounts to ensure that there isn't a WARN emitted, as I previously suspected. Again, the goal here is purely to improve the error messaging of the workflow and to make sure there isn't some unexpected behavior; this is tangential to getting your analysis running with the correct hg38 exon list, so we appreciate it!

  • sleeslee Member, Broadie, Dev ✭✭✭

    Actually, I now see from your CreateReadCountPanelOfNormals log the following line:

    WARN CreateReadCountPanelOfNormals - Sequence dictionary in annotated-intervals file does not match the master sequence dictionary.

    This indicates the hg38 dictionary in the annotated-intervals file does not match the hg38 dictionary in the counts file. I suspect that there indeed may be a similar WARN in your CollectReadCounts log file.

  • sarawaslsarawasl Member

    @slee Many thanks.

    Regarding the LiftOverIntervalList part, I just have a quick look and it needs this parameter CHAIN=build.chain as it said: a file that guides the LiftOver process; but should I create or what exactly?

    For the second part, I attached my reference dictionary, but how to check if it's the same as my BAM file/s?

    And also I attached the three logs of (PreprocessIntervals, AnnotateIntervals and CollectCounts).

  • sleeslee Member, Broadie, Dev ✭✭✭

    Thanks @sarawasl. I do see that WARN in your CollectReadCounts log, as I expected. So I think @asmirnov's initial suspicion was indeed correct. Our reasoning for emitting WARNs rather than failing is touched upon in the GitHub issue I linked, so I think I probably will not change the error messaging.

    The hg38 dictionary you attached seems to be sorted in lexicographical order, which is unusual and might cause problems with GATK tools. I'd take some time to make sure the dictionaries between your BAMs, reference files, and interval list are all identical before using the somatic CNV WDLs.

    You should be able to search the forum for information about finding a liftover chain and checking the sequence dictionary in your BAMs. (I'd offer more detailed help, but I'm currently on paternity leave and just checking the forum in my spare time!)

  • sarawaslsarawasl Member

    Thanks @slee for your help.

    Oh ok understood, I will try to figure out the dictionaries issue that you notice and also doing the liftover.

    No problem at all (you helped a lot and this is from your generosity and your passion for work).

    I will get back soon after I solve these all and try the pipeline again.

Sign In or Register to comment.