Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Annotated intervals do not match provided intervals | CreateReadCountPanelOfNormals

Hello all,
I'm using WDL for somatic CNV, I start with cnv_somatic_panel_workflow.wdl which basically includes 4 tasks/functions:
CNVTasks.PreprocessIntervals (Done)
CNVTasks.AnnotateIntervals (Done)
CNVTasks.CollectCounts (Done)
CreateReadCountPanelOfNormals (Error)
The first 3 works well, but the problem/error is from the last one in which it shows me this error:
```
java.lang.IllegalArgumentException: Annotated intervals do not match provided intervals.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:135)
at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/10/08 18:23:25 INFO ShutdownHookManager: Shutdown hook called
19/10/08 18:23:25 INFO ShutdownHookManager: Deleting directory /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/tmp.16de278a/spark-8a26fda4-9b30-49b0-b795-846caa9a1e35
Using GATK jar /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar defined in environment variable GATK_LOCAL_JAR
```
The initial interval file been taken from here https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0 as suggested here https://gatkforums.broadinstitute.org/gatk/discussion/10215/intervals-and-interval-lists
Any help how to solve this please?
I'm using WDL for somatic CNV, I start with cnv_somatic_panel_workflow.wdl which basically includes 4 tasks/functions:
CNVTasks.PreprocessIntervals (Done)
CNVTasks.AnnotateIntervals (Done)
CNVTasks.CollectCounts (Done)
CreateReadCountPanelOfNormals (Error)
The first 3 works well, but the problem/error is from the last one in which it shows me this error:
```
java.lang.IllegalArgumentException: Annotated intervals do not match provided intervals.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:135)
at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/10/08 18:23:25 INFO ShutdownHookManager: Shutdown hook called
19/10/08 18:23:25 INFO ShutdownHookManager: Deleting directory /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/tmp.16de278a/spark-8a26fda4-9b30-49b0-b795-846caa9a1e35
Using GATK jar /cromwell-executions/CNVSomaticPanelWorkflow/7481fd6a-e289-4bd6-b195-77b4d45d752e/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar defined in environment variable GATK_LOCAL_JAR
```
The initial interval file been taken from here https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0 as suggested here https://gatkforums.broadinstitute.org/gatk/discussion/10215/intervals-and-interval-lists
Any help how to solve this please?
Tagged:
Answers
@sarawasl It could be due to the reference mismatch between input BAMs and provided interval file. Were the input bams aligned using hg19?
Could you look at the headers of the count files and prerpocessed/annotated interval files and see if the contig names/lengths are the same?
But as I checked, there is no interval file for exome data; Are there any?
Then, what can I use for that?
@sarawasl Usually you would need to obtain a list of targets that is specific to your capture kit design and make interval list out of that. You could try looking in the BAM header for that information.
If you don't have that, I can offer you this generic exome list (attached). However, it's possible it will miss some regions that your capture kits has. If you go this route I recommend you use
FilterIntervals
tool that we have to get rid of zero covered regions.So then, I will go with the one you provided here and I will use FilterIntervals as you suggest.
I will get back to you after re-try again.
Many thanks for your help.
When I tried FilterIntervals, I cannot since it requires --annotated-intervals annotated_intervals.tsv in which I don't have it (or from where I can get it?)
So, I decide to proceed directly and run the pipeline again with your provided interval file but it gives me this error (when it running this function PreprocessIntervals):
Interval file could not be parsed in any supported format.
interval_list has an invalid interval : 1:68993-70105 + .
Any idea or suggestions please.
@sarawasl It might benefit you to review the notes from the somatic CNV panel workflow WDL (https://github.com/broadinstitute/gatk/blob/master/scripts/cnv_wdl/somatic/cnv_somatic_panel_workflow.wdl):
The documentation for PreprocessIntervals (https://software.broadinstitute.org/gatk/documentation/tooldocs/4.1.4.0/org_broadinstitute_hellbender_tools_copynumber_PreprocessIntervals.php) may also be helpful, as well as the tutorial at https://software.broadinstitute.org/gatk/documentation/article?id=11682.
The purpose of the PreprocessIntervals tool is to create the bins for coverage collection. Typically, for WES, this is done by providing the target intervals via
-L
, enabling padding, and disabling binning by specifying--bin-length 0
. The result is bins of unequal length given by the padded targets. (This is in contrast to what is done for WGS, where we typically specify the autosomes via-L
, disable padding, and specify the desired bin length, resulting in bins of equal length that cover the autosomes.)In either case, the resulting bins are then provided via
-L
to both the CollectReadCounts and AnnotateIntervals tool (since we are collecting counts and annotating GC content in these bins). Specifying different sets of intervals to these tools may ultimately result in the error message you saw---might this have been the case?If you do not have access to the target intervals, then the exon list that @asmirnov provided will probably suffice. Since CreateReadCountPanelOfNormals performs its own filtering steps, you may not need to use FilterIntervals (which is used in the germline CNV workflow) as he suggested.
I'm still a bit confused as to how you encountered this error if you are just running the WDL. Confusion about WES vs. WGS aside, I'm not sure this should happen if there was a reference mismatch between the intervals and the BAMs. Can you provide some more details, such as the JSON you used to run the WDL?
Please forget about mismatch error, in the second response I clarify that it was my mistake of using the WGS interval file for my WES samples.
So, when I got the new correct interval from @asmirnov and try to the CreateReadCountPanelOfNormals WDL again I got this error that I mentioned in my last response:
interval file could not be parsed in any supported format.
interval_list has an invalid interval : 1:68993-70105 + .
And this is my provided .json file (as I run it with only one file; for easy tracking the errors then if it works well, I would apply it for the whole samples):
{
"CNVSomaticPanelWorkflow.normal_bams": ["/cancer/f40ba854-81bd-4f64-bd3d-72e113d6c04a/0991191801648da74a240847e8803df5_gdc_realn.bam"],
"CNVSomaticPanelWorkflow.normal_bais": ["/cancer/f40ba854-81bd-4f64-bd3d-72e113d6c04a/0991191801648da74a240847e8803df5_gdc_realn.bai"],
"CNVSomaticPanelWorkflow.ref_fasta_dict": "/projects/c2014/genomics/reference/hg38.dict",
"CNVSomaticPanelWorkflow.gatk4_jar_override": "/sw/csi/gatk/4.1.2.0/el7.5_binary/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar",
"CNVSomaticPanelWorkflow.pon_entity_id": "weg-pon",
"CNVSomaticPanelWorkflow.ref_fasta_fai": "/projects/c2014/genomics/reference/hg38.fa.fai",
"CNVSomaticPanelWorkflow.gatk_docker": "broadinstitute/gatk:latest",
"CNVSomaticPanelWorkflow.ref_fasta": "/projects/c2014/genomics/reference/hg38.fa",
"CNVSomaticPanelWorkflow.intervals": "/cancer/germline_resources_GATK_CNV_Germline_Validation_intervals.interval_list"
}
@sarawasl from that JSON it looks like you are using the hg38 reference, while the intervals that @asmirnov provided you are for hg19.
Note also that you should set
bin-length
andpadding
as discussed.I understand that you used the wrong interval file. My concern is that the specific error that you got doesn't seem like it should've resulted from that action; I would expect that the workflow would have completed successfully, but that the results would not have been useful (as they would have used the WGS calling regions to create bins). So I just want to make sure there isn't some unexpected bug in the workflow (and if not, perhaps we can fail earlier with a more informative message).
If that's possible, could you please provide me the interval file for hg38 for WES.
Once it been provided, I will add bin-length and padding.
Yeah as I said before; trying the previous interval file (for WGS) for my WES bam file with this WDL workflow gives me the error of mismatch within CreateReadCountPanelOfNormals task (Annotated intervals do not match provided intervals.)
@sarawasl Actually, apologies, i think I now see how you might've encountered that error with a reference mismatch. It's possible that one of the tools threw a warning, rather than failed; see https://github.com/broadinstitute/gatk/pull/4758 for context. Perhaps you can check your logs and let us know if this is the case?
Are your BAMs hg19 or hg38? Also, what is the naming convention of your contigs (e.g.,
chr1
vs.1
)? See https://software.broadinstitute.org/gatk/documentation/article?id=11010. You should ensure that all BAMs, reference files, and interval lists have the same dictionary. In any case, I would expect that you can use the hg19 exon list that @asmirnov provided and simply lift it over or modify the naming convention as needed.@slee I checked all the logs files of each of the 4 tasks/functions, all work well without any warnings or errors except the last one (CreateReadCountPanelOfNormals) which gives the error of mismatch:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-Creat
eReadCountPanelOfNormals/tmp.aa0aa9b9
02:46:38.855 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be
set or the GCS hadoop connector will not be configured properly
02:46:38.931 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cromwell-executions/CNVSomaticPanelWorkflow/c9f4
d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/n
ative/libgkl_compression.so
Oct 12, 2019 2:46:40 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
02:46:40.654 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
02:46:40.655 INFO CreateReadCountPanelOfNormals - The Genome Analysis Toolkit (GATK) v4.1.2.0
02:46:40.655 INFO CreateReadCountPanelOfNormals - For support and documentation go to https://software.broadinstitute.org/gatk/
02:46:40.655 INFO CreateReadCountPanelOfNormals - Executing as [email protected] on Linux v3.10.0-957.12.1.el7.x86_64 amd64
02:46:40.656 INFO CreateReadCountPanelOfNormals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
02:46:40.656 INFO CreateReadCountPanelOfNormals - Start Date/Time: October 12, 2019 2:46:38 AM UTC
02:46:40.656 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
02:46:40.656 INFO CreateReadCountPanelOfNormals - ------------------------------------------------------------
02:46:40.657 INFO CreateReadCountPanelOfNormals - HTSJDK Version: 2.19.0
02:46:40.658 INFO CreateReadCountPanelOfNormals - Picard Version: 2.19.0
02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
02:46:40.658 INFO CreateReadCountPanelOfNormals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
02:46:40.658 INFO CreateReadCountPanelOfNormals - Deflater: IntelDeflater
02:46:40.658 INFO CreateReadCountPanelOfNormals - Inflater: IntelInflater
02:46:40.658 INFO CreateReadCountPanelOfNormals - GCS max retries/reopens: 20
02:46:40.659 INFO CreateReadCountPanelOfNormals - Requester pays: disabled
02:46:40.659 INFO CreateReadCountPanelOfNormals - Initializing engine
02:46:40.659 INFO CreateReadCountPanelOfNormals - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/12 02:46:40 INFO SparkContext: Running Spark version 2.2.0
19/10/12 02:46:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where ap
plicable
19/10/12 02:46:41 INFO SparkContext: Submitted application: CreateReadCountPanelOfNormals
19/10/12 02:46:41 INFO SecurityManager: Changing view acls to: althubsw
19/10/12 02:46:41 INFO SecurityManager: Changing modify acls to: althubsw
19/10/12 02:46:41 INFO SecurityManager: Changing view acls groups to:
19/10/12 02:46:41 INFO SecurityManager: Changing modify acls groups to:
19/10/12 02:46:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(althubsw); groups with view permissions: Set(); users with modify permissions: Set(althubsw); groups with modify permissions: Set()
19/10/12 02:46:41 INFO Utils: Successfully started service 'sparkDriver' on port 36605.
19/10/12 02:46:41 INFO SparkEnv: Registering MapOutputTracker
19/10/12 02:46:41 INFO SparkEnv: Registering BlockManagerMaster
19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/10/12 02:46:41 INFO DiskBlockManager: Created local directory at /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/tmp.aa0aa9b9/blockmgr-9e29af49-d400-43cb-a267-274961663f6d
19/10/12 02:46:41 INFO MemoryStore: MemoryStore started with capacity 3.2 GB
19/10/12 02:46:41 INFO SparkEnv: Registering OutputCommitCoordinator
19/10/12 02:46:41 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/10/12 02:46:41 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.109.201.8:4040
19/10/12 02:46:41 INFO Executor: Starting executor ID driver on host localhost
19/10/12 02:46:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38943.
19/10/12 02:46:41 INFO NettyBlockTransferService: Server created on 10.109.201.8:38943
19/10/12 02:46:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/10/12 02:46:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.109.201.8, 38943, None)
19/10/12 02:46:41 INFO BlockManagerMasterEndpoint: Registering block manager 10.109.201.8:38943 with 3.2 GB RAM, BlockManagerId(driver, 10.109.201.8, 38943, None)
19/10/12 02:46:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.109.201.8, 38943, None)
19/10/12 02:46:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.109.201.8, 38943, None)
02:46:41.811 INFO CreateReadCountPanelOfNormals - Spark verbosity set to INFO (see --spark-verbosity argument)
19/10/12 02:46:41 INFO HDF5Library: Trying to load HDF5 library from:
jar:file:/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar!/org/broadinstitute/hdf5/libjhdf5.2.11.0.so
19/10/12 02:46:41 INFO H5: HDF5 library:
19/10/12 02:46:41 INFO H5: successfully loaded.
02:46:41.893 WARN CreateReadCountPanelOfNormals - Number of eigensamples (20) is greater than the number of input samples (1); the number of samples retained after filtering will be used instead.
02:46:41.902 INFO CreateReadCountPanelOfNormals - Retrieving intervals from first read-counts file (/cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/970456118/0991191801648da74a240847e8803df5_gdc_realn.counts.hdf5)...
02:46:42.013 INFO CreateReadCountPanelOfNormals - Reading and validating annotated intervals...
02:46:42.058 WARN CreateReadCountPanelOfNormals - Sequence dictionary in annotated-intervals file does not match the master sequence dictionary.
19/10/12 02:46:42 INFO SparkUI: Stopped Spark web UI at http://10.109.201.8:4040
19/10/12 02:46:42 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/10/12 02:46:42 INFO MemoryStore: MemoryStore cleared
19/10/12 02:46:42 INFO BlockManager: BlockManager stopped
19/10/12 02:46:42 INFO BlockManagerMaster: BlockManagerMaster stopped
19/10/12 02:46:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/10/12 02:46:42 INFO SparkContext: Successfully stopped SparkContext
02:46:42.107 INFO CreateReadCountPanelOfNormals - Shutting down engine
[October 12, 2019 2:46:42 AM UTC] org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2167930880
java.lang.IllegalArgumentException: Annotated intervals do not match provided intervals.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervals(CopyNumberArgumentValidationUtils.java:135)
at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:276)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
19/10/12 02:46:42 INFO ShutdownHookManager: Shutdown hook called
19/10/12 02:46:42 INFO ShutdownHookManager: Deleting directory /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/tmp.aa0aa9b9/spark-5ab8ff4c-e99a-4371-9ffb-1bfa3530c835
Using GATK jar /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6500m -jar /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1551391607/gatk-package-4.1.2.0-local.jar CreateReadCountPanelOfNormals --input /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/970456118/0991191801648da74a240847e8803df5_gdc_realn.counts.hdf5 --minimum-interval-median-percentile 10.0 --maximum-zeros-in-sample-percentage 5.0 --maximum-zeros-in-interval-percentage 5.0 --extreme-sample-median-percentile 2.5 --do-impute-zeros true --extreme-outlier-truncation-percentile 0.1 --number-of-eigensamples 20 --maximum-chunk-size 16777216 --annotated-intervals /cromwell-executions/CNVSomaticPanelWorkflow/c9f4d318-8463-47df-9289-47ab424f256b/call-CreateReadCountPanelOfNormals/inputs/-1667627880/intervals-wgs_calling_regions.hg38.preprocessed.annotated.tsv --output weg-pon.pon.hdf5
All my BAM files are hg38.
When I checked the naming convention for my BAMs, reference file, and the interval file (provided by @asmirnov); my BAMs and reference file ~> chrN but the interval file (provided by @asmirnov) ~> just the number N (without chr) and with that it stops as we discussed before at the first task (PreprocessIntervals) with the error of:
interval file could not be parsed in any supported format.
interval_list has an invalid interval : 1:68993-70105 + .
But regarding the previous interval file (relate to WGS), it's the same as mine BAMs and reference files (chrN) and stops at the last task (CreateReadCountPanelOfNormals) also as we see before with the error of mismatch.
Thanks @sarawasl. Indeed, I think you should be able to proceed with your analysis if you lift over the hg19 exon interval list that @asmirnov provided to hg38. The "interval file could not be parsed in any supported format" message emitted by PreprocessIntervals is expected behavior, since you are trying to use hg19 dictionary + intervals with an hg38 reference. Let us know if properly lifting over the interval list (using e.g. LiftOverIntervalList) does not resolve things.
Thanks also for your help in continuing to diagnose the possible reference mismatch from your original issue. It might be helpful to see the hg38 dictionaries from your BAM and reference just to make sure there aren't any subtle mismatches there. It would also be helpful to see the logs from AnnotateIntervals and CollectReadCounts to ensure that there isn't a WARN emitted, as I previously suspected. Again, the goal here is purely to improve the error messaging of the workflow and to make sure there isn't some unexpected behavior; this is tangential to getting your analysis running with the correct hg38 exon list, so we appreciate it!
Actually, I now see from your CreateReadCountPanelOfNormals log the following line:
WARN CreateReadCountPanelOfNormals - Sequence dictionary in annotated-intervals file does not match the master sequence dictionary.
This indicates the hg38 dictionary in the annotated-intervals file does not match the hg38 dictionary in the counts file. I suspect that there indeed may be a similar WARN in your CollectReadCounts log file.
@slee Many thanks.
Regarding the LiftOverIntervalList part, I just have a quick look and it needs this parameter
CHAIN=build.chain
as it said: a file that guides the LiftOver process; but should I create or what exactly?For the second part, I attached my reference dictionary, but how to check if it's the same as my BAM file/s?
And also I attached the three logs of (
PreprocessIntervals
,AnnotateIntervals
andCollectCounts
).Thanks @sarawasl. I do see that WARN in your CollectReadCounts log, as I expected. So I think @asmirnov's initial suspicion was indeed correct. Our reasoning for emitting WARNs rather than failing is touched upon in the GitHub issue I linked, so I think I probably will not change the error messaging.
The hg38 dictionary you attached seems to be sorted in lexicographical order, which is unusual and might cause problems with GATK tools. I'd take some time to make sure the dictionaries between your BAMs, reference files, and interval list are all identical before using the somatic CNV WDLs.
You should be able to search the forum for information about finding a liftover chain and checking the sequence dictionary in your BAMs. (I'd offer more detailed help, but I'm currently on paternity leave and just checking the forum in my spare time!)
Thanks @slee for your help.
Oh ok understood, I will try to figure out the dictionaries issue that you notice and also doing the liftover.
No problem at all (you helped a lot and this is from your generosity and your passion for work).
I will get back soon after I solve these all and try the pipeline again.