Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

PoN for CNV_Somatic_Pair_Workflow

I am looking to use the open access PoN (‎gs://firecloud-tcga-open-access/tutorial/reference/open_access_pon_from_1000g.pon‎) but when I use this file as the input into the method CNV_Somatic_Pair_Workflow I get the following error:

A USER ERROR has occurred: Bad input: The panel of normals is out of date and incompatible. Please use a panel of normals that was created by CreateReadCountPanelOfNormals and is version 7.0.

Is there some preprocessing of the file that I should be doing?

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Those are not resources we manage but I suspect the TCGA pons are intended for short variant discovery (SNPs and indels), not for CNVs. You may need to create your own CNV PON. Try asking in the Cancer Genome Analysis forum here: https://gatkforums.broadinstitute.org/firecloud/categories/cancer-genome-analysis

  • jml96jml96 CambridgeMember

    Hi,
    A adapted the wdl files for somatic copy number variant discovery with GATK4 to run in a local server.
    The cnv_somatic_panel_workflow.wdl run without problem but when I run cnv_somatic_pair_workflow.wdl I get the following error.

    Using GATK jar .../gatk-4.1.2.0-37-g90fd39b.jar defined in environment variable GATK_LOCAL_JAR
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx12000m -jar ./gatk-4.1.2.0-37-g90fd39b.jar DenoiseReadCounts --input ...hdf5 --count-panel-of-normals ...hdf5 --standardized-copy-ratios ...tsv --denoised-copy-ratios ...tsv
    Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=.../cromwell-executions/CNVSomaticPairWorkflow/56355d35-5ed6-4e13-90b8-dc2ecf0559c6/call-DenoiseReadCountsTumor/tmp.ad5f42d5
    15:07:11.912 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:.../gatk-4.1.2.0-37-g90fd39b.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 23, 2019 3:07:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    15:07:12.438 INFO DenoiseReadCounts - ------------------------------------------------------------
    15:07:12.439 INFO DenoiseReadCounts - The Genome Analysis Toolkit (GATK) v4.1.2.0-37-g90fd39b-SNAPSHOT
    15:07:12.439 INFO DenoiseReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
    15:07:12.440 INFO DenoiseReadCounts - Executing as ... on Linux v3.10.0-957.12.2.el7.x86_64 amd64
    15:07:12.440 INFO DenoiseReadCounts - Java runtime: OpenJDK 64-Bit Server VM v12.0.2+10
    15:07:12.440 INFO DenoiseReadCounts - Start Date/Time: 23 July 2019 at 15:07:11 BST
    15:07:12.440 INFO DenoiseReadCounts - ------------------------------------------------------------
    15:07:12.440 INFO DenoiseReadCounts - ------------------------------------------------------------
    15:07:12.440 INFO DenoiseReadCounts - HTSJDK Version: 2.19.0
    15:07:12.440 INFO DenoiseReadCounts - Picard Version: 2.19.0
    15:07:12.441 INFO DenoiseReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    15:07:12.441 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:07:12.441 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:07:12.441 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:07:12.441 INFO DenoiseReadCounts - Deflater: IntelDeflater
    15:07:12.441 INFO DenoiseReadCounts - Inflater: IntelInflater
    15:07:12.441 INFO DenoiseReadCounts - GCS max retries/reopens: 20
    15:07:12.441 INFO DenoiseReadCounts - Requester pays: disabled
    15:07:12.441 INFO DenoiseReadCounts - Initializing engine
    15:07:12.441 INFO DenoiseReadCounts - Done initializing engine
    log4j:WARN No appenders could be found for logger (org.broadinstitute.hdf5.HDF5Library).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    15:07:13.768 INFO DenoiseReadCounts - Reading read-counts file (....hdf5)...
    15:07:14.041 INFO DenoiseReadCounts - Shutting down engine
    [23 July 2019 at 15:07:14 BST] org.broadinstitute.hellbender.tools.copynumber.DenoiseReadCounts done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=98566144


    A USER ERROR has occurred: Bad input: The panel of normals is out of date and incompatible. Please use a panel of normals that was created by CreateReadCountPanelOfNormals and is version 7.0.


    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    The panel of normals hdf5 file was generated with the same workflow version.

    Thank you.

    João

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @jml96

    Was the PoN generated with the same gatk jar version?

  • jml96jml96 CambridgeMember

    Yes,
    cnv_somatic_panel_workflow.wdl and cnv_somatic_pair_workflow.wdl files were adapted from the ones in https://github.com/gatk-workflows/gatk4-somatic-cnvs. I am also using gatk-4.1.2.0-37-g90fd39b.jar in both workflows.
    Thank you.

    João

  • sleeslee Member, Broadie, Dev ✭✭✭

    @jml96, can you verify that you are passing the panel of normals HDF5 file created by cnv_somatic_panel_workflow.wdl as input to the --count-panel-of-normals? If you are passing an HDF5 file that does not have the fields expected for a PoN (including version number), then you will see the message that you did; I would guess that you might have accidentally passed the counts HDF5 file instead.

    However, it's impossible to determine this from your command line DenoiseReadCounts --input ...hdf5 --count-panel-of-normals ...hdf5 --standardized-copy-ratios ...tsv --denoised-copy-ratios ...tsv. Did you edit this to remove the filenames, or was this indeed the command line generated by your WDL run?

  • jml96jml96 CambridgeMember

    Hi,
    Yes, I edited the paths to directories. I am pointing the --count-panel-of-normals to the output file generated in cnv_somatic_panel_workflow (.../cromwell-executions/CNVSomaticPanelWorkflow/44b79622-4965-4e91-b5cb-0e050ac1341c/call-CollectCounts/shard-0/execution).
    Thank you.

    Regards,
    João

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited July 26

    @jml96 As I suspected, that is a counts HDF5 file. It's created by the WDL task CollectCounts, which runs the GATK tool CollectReadCounts on a single BAM and produces a corresponding HDF5 file that represents the counts in each genomic bin.

    What you want to pass instead is the panel HDF5 file produced by the WDL task CreateReadCountPanelOfNormals. The panel workflow uses all of the count files from your panel samples to construct this file.

    You might find it helpful to refer to the tutorial at https://software.broadinstitute.org/gatk/documentation/article?id=11682, especially if you are trying to adapt the workflows for your own purposes.

Sign In or Register to comment.