We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Description and examples of the steps in the CNV case and CNV PoN creation workflows

The CNV case and PoN workflows (description and examples) for earlier releases of GATK4.
For a newer tutorial using GATK4's v1.0.0.0-alpha1.2.3 release (Version:0288cff-SNAPSHOT from September 2016), see Article#9143 and this data bundle. If you have a question on the Somatic_CNV_handson tutorial, please post it as a new question using this form.
Requirements
- Java 1.8
- A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
- HDF5 1.8.13
- The location of the HDF5-Java JNI Libraries Release 2.9 (2.11 for Macs).
Typical locations:
Ubuntu:/usr/lib/jni/
Mac:/Applications/HDFView.app/Contents/Resources/lib/
Broad internal servers:/broad/software/free/Linux/redhat_6_x86_64/pkgs/hdfview_2.9/HDFView/lib/linux/
- Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
- PoN file (when running case samples only). This file should be created using the Create PoN workflow (see below).
- Target BED file that was used to create the PoN file. Format details can be found here . NOTE: For the CNV tools, you will need a fourth column for target name, which must be unique across rows.
1 12200 12275 target1 1 13505 13600 target2 1 31000 31500 target3 1 35138 35174 target4 ....snip....
Before running the workflows, we recommend padding the target file by 250 bases with the PadTargets
tool. Example: java -jar gatk-protected.jar PadTargets --targets initial_target_file.bed --output initial_target_file.padded.bed --padding 250
This allows some off-target reads to be factored into the copy ratio estimates. Our internal evaluations have shown that this improves results.
If you are using the premade Queue scripts (see below), you can specify the padding there and the workflow will generate the padded targets automatically (i.e. there is no reason to run PadTargets explicitly if you are using the premade Queue scripts).
Case sample workflow
This workflow requires a PoN file generated by the Create PoN workflow.
If you do not have a PoN, please skip to the Create PoN workflow, below ....
Overview of steps
- Step 0. (recommended) Pad Targets (see example above)
- Step 1. Collect proportional coverage
- Step 2. Create coverage profile
- Step 3. Segment coverage profile
- Step 4. Plot coverage profile
- Step 5. Call segments
Step 1. Collect proportional coverage
Inputs
- bam file
- target bed file -- must be the same that was used for the PoN
- reference_sequence (required by GATK) -- fasta file with b37 reference.
Outputs
- Proportional coverage tsv file -- Mx5 matrix of proportional coverage, where M is the number of targets. The fifth column will be named for the sample in the bam file (found in the bam file
SM
tag). If the file exists, it will be overwritten.
##fileFormat = tsv ##commandLine = org.broadinstitute.hellbender.tools.exome.ExomeReadCounts ...snip... ##title = Read counts per target and sample CONTIG START END NAME SAMPLE1 1 12200 12275 target1 1.150e-05 1 13505 13600 target2 1.500e-05 1 31000 31500 target3 7.000e-05 ....snip....
Invocation
java -Xmx8g -jar <path_to_hellbender_protected_jar> CalculateTargetCoverage -I <input_bam_file> -O <pcov_output_file_path> --targets <target_BED> -R <ref_genome> \ -transform PCOV --targetInformationColumns FULL -groupBy SAMPLE -keepdups
Step 2. Create coverage profile
Inputs
- proportional coverage file from Step 1
- target BED file -- must be the same that was used for the PoN
- PoN file
- directory containing the HDF5 JNI native libraries
Outputs
- normalized coverage file (tsv) -- details each target with chromosome, start, end, and log copy ratio estimate
#fileFormat = tsv #commandLine = ....snip.... #title = ....snip.... name contig start stop SAMPLE1 target1 1 12200 12275 -0.5958351605220968 target2 1 13505 13600 -0.2855054918109098 target3 1 31000 31500 -0.11450116047248263 ....snip....
- pre-tangent-normalization coverage file (tsv) -- same as normalized coverage file (tsv) above, but copy ratio estimates are before the noise reduction step. The file format is the same as the normalized coverage file (tsv).
- fnt file (tsv) -- proportional coverage divided by the target factors contained in the PoN. The file format is the same as the proportional coverage in step 1.
- betaHats (tsv) -- used by developers and evaluators, typically, but output location must be specified. These are the
coefficients used in the projection of the case sample into the (reducued) PoN. This will be a Mx1 matrix where M is the number of targets.
Invocation
java -Djava.library.path=<hdf_jni_native_dir> -Xmx8g -jar <path_to_hellbender_protected_jar> NormalizeSomaticReadCounts -I <pcov_input_file_path> -T <target_BED> -pon <pon_file> \ -O <output_target_cr_file> -FNO <output_target_fnt_file> -BHO <output_beta_hats_file> -PTNO <output_pre_tangent_normalization_cr_file>
Step 3. Segment coverage profile
Inputs
- normalized coverage file (tsv) -- from step 2.
- sample name
Outputs
- seg file (tsv) -- segment file (tsv) detailing contig, start, end, and copy ratio (segment_mean) for each detected segment. Note that this is a different format than python recapseg, since the segment mean no longer has log2 applied.
Sample Chromosome Start End Num_Probes Segment_Mean SAMPLE1 1 12200 70000 18 0.841235 SAMPLE1 1 300600 1630000 337 1.23232323 ....snip....
Invocation
java -Xmx8g -jar <path_to_hellbender_protected_jar> PerformSegmentation -S <sample_name> -T <normalized_coverage_file> -O <output_seg_file> -log
Step 4. Plot coverage profile
Inputs
- normalized coverage file (tsv) -- from step 2.
- pre-normalized coverage file (tsv) -- from step 2.
- segmented coverage file (seg) -- from step 3.
- sample name, see above
Outputs
- beforeAfterTangentLimPlot (png) -- Output before/after tangent normalization plot up to copy-ratio 4
- beforeAfterTangentPlot (png) -- Output before/after tangent normalization plot
- fullGenomePlot (png) -- Full genome plot after tangent normalization
- preQc (txt) -- Median absolute differences of targets before normalization
- postQc (txt) -- Median absolute differences of targets after normalization
- dQc (txt) -- Difference in median absolute differences of targets before and after normalization
Invocation
java -Xmx8g -jar <path_to_hellbender_protected_jar> PlotSegmentedCopyRatio -S <sample_name> -T <normalized_coverage_file> -P <pre_normalized_coverage_file> -seg <segmented_coverage_file> -O <output_seg_file> -log
Step 5. Call segments
Inputs
- normalized coverage file (tsv) -- from step 2.
- seg file (tsv) -- from step 3.
- sample name
Outputs
- called file (tsv) -- output is exactly the same as in seg file (step 3), except Segment_Call column is added. Calls are either "+", "0", or "-" (no quotes).
Sample Chromosome Start End Num_Probes Segment_Mean Segment_Call SAMPLE1 1 12200 70000 18 0.841235 - SAMPLE1 1 300600 1630000 337 1.23232323 0 ....snip....
Invocation
java -Xmx8g -jar <path_to_hellbender_protected_jar> CallSegments -T <normalized_coverage_file> -S <seg_file> -O <output_called_seg_file> -sample <sample_name>
Create PoN workflow
This workflow can take some time to run depending on how many samples are going into your PoN and the number of targets you are covering. Basic time estimates are found in the Overview of Steps.
Additional requirements
- Normal sample bam files to be used in the PoN. The index files (.bai) must be local to all of the associated bam files.
Overview of steps
- Step 1. Collect proportional coverage. (~20 minutes for mean 150x coverage and 150k targets, per sample)
- Step 2. Combine proportional coverage files (< 5 minutes for 150k targets and 300 samples)
- Step 3. Create the PoN file (~1.75 hours for 150k targets and 300 samples)
All time estimates are using the internal Broad infrastructure.
Step 1. Collect proportional coverage on each bam file
This is exactly the same as the case sample workflow, except that this needs to be run once for each input bam file, each with a different output file name. Otherwise, the inputs should be the same for each bam file.
Please see documentation above.
IMPORTANT NOTE: You must create a list of the proportional coverage files (i.e. output files) that you create in this step. One output file per line in a text file (see step 2)
Step 2. Merge proportional coverage files
This step merges the proportional coverage files into one large file with a separate column for each samples.
Inputs
- list of proportional coverage files generated (possibly manually) in step 1. This is a text file.
/path/to/pcov_file1.txt /path/to/pcov_file2.txt /path/to/pcov_file3.txt ....snip....
Outputs
- merged tsv of proportional coverage
CONTIG START END NAME SAMPLE1 SAMPLE2 SAMPLE3 ....snip.... 1 12191 12227 target1 8.835E-6 1.451E-5 1.221E-5 ....snip.... 1 12596 12721 target2 1.602E-5 1.534E-5 1.318E-5 ....snip.... ....snip....
Invocation
java -Xmx8g -jar <path_to_hellbender_protected_jar> CombineReadCounts --inputList <text_file_list_of_proportional_coverage_files> \ -O <output_merged_file> -MOF 200
Step 3. Create the PoN file
Inputs
- merged tsv of proportional coverage -- generated in step 2.
Outputs
- PoN file -- HDF5 format. This file can be used for running case samples sequenced with the same process.
Invocation
java -Xmx16g -Djava.library.path=<hdf_jni_native_dir> -jar <path_to_hellbender_protected_jar> CreatePanelOfNormals -I <merged_pcov_file> \ -O <output_pon_file_full_path>
Comments
Will you support the CNV workflow while it's part of GATK4 alpha? thanks
Hi @noa, the short answer is yes. The longer answer is that right now we're still figuring out how to tackle supporting GATK4 tools in parallel to GATK3, but our support team is gearing up to start supporting the CNV tools very soon -- so don't hesitate to get started with the tools, and let us know if you run into any problems. We will do our best to help you apply the CNV tools to your work.
Can the CNV workflow be used for germline WES data?
@sdubayan Not this workflow. We will be releasing a GATK4 Germline CNV Capture (WES) workflow soon. We are aiming for the end of July or end of August 2016, though this is still tentative.
@Haiying7 I added the typical locations. However, these may not apply to your environment.
@Haiying7 I edited the post for #4
You can try
locate libjhdf5.so
:Using your target file with the tool PadTarget I get an error message: what is the missing header for a bed-file?
[[email protected] targets]$ java -jar ~/bin/hellbender-protected.jar PadTargets --targets test.bed --output targets/test_250padded.bed --padding 250
[07. Juli 2016 09:39:34 MESZ] org.broadinstitute.hellbender.tools.exome.PadTargets --targets test.bed --output targets/test_250padded.bed --padding 250 --help false --version false --verbosity INFO --QUIET false
[07. Juli 2016 09:39:34 MESZ] Executing as [email protected] at on Linux 4.4.9-300.fc23.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b14; Version: Version:351addc-SNAPSHOT
09:39:34.292 INFO PadTargets - Defaults.BUFFER_SIZE : 131072
09:39:34.292 INFO PadTargets - Defaults.COMPRESSION_LEVEL : 5
09:39:34.292 INFO PadTargets - Defaults.CREATE_INDEX : false
09:39:34.292 INFO PadTargets - Defaults.CREATE_MD5 : false
09:39:34.292 INFO PadTargets - Defaults.CUSTOM_READER_FACTORY :
09:39:34.292 INFO PadTargets - Defaults.EBI_REFERENCE_SEVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
09:39:34.293 INFO PadTargets - Defaults.INTEL_DEFLATER_SHARED_LIBRARY_PATH : null
09:39:34.293 INFO PadTargets - Defaults.NON_ZERO_BUFFER_SIZE : 131072
09:39:34.293 INFO PadTargets - Defaults.REFERENCE_FASTA : null
09:39:34.293 INFO PadTargets - Defaults.TRY_USE_INTEL_DEFLATER : true
09:39:34.293 INFO PadTargets - Defaults.USE_ASYNC_IO : false
09:39:34.293 INFO PadTargets - Defaults.USE_ASYNC_IO_FOR_SAMTOOLS : false
09:39:34.293 INFO PadTargets - Defaults.USE_ASYNC_IO_FOR_TRIBBLE : false
09:39:34.293 INFO PadTargets - Defaults.USE_CRAM_REF_DOWNLOAD : false
09:39:34.294 INFO PadTargets - Deflater JdkDeflater
09:39:34.294 INFO PadTargets - Initializing engine
09:39:34.294 INFO PadTargets - Done initializing engine
09:39:34.307 INFO TargetTableReader - Reading targets from file '/home/klaus/CNV/targets/test.bed' ...
09:39:34.325 INFO PadTargets - Shutting down engine
[07. Juli 2016 09:39:34 MESZ] org.broadinstitute.hellbender.tools.exome.PadTargets done. Elapsed time: 0,00 minutes.
Runtime.totalMemory()=246415360
A USER ERROR has occurred: Bad input: format error in 'test.bed' at line 1: Bad header in file. Not all mandatory columns are present. Missing: 1 12200 12275 target1
@Haiying7 Has anyone installed hdfview?
@wagnerk I am assuming you are using the codebase in master, not the latest release (1.0.0.0-alpha1.2.1). If I am correct, just run the tool
ConvertBedToTargetFile
and use the output of that to go into PadTargets.ConvertBedToTargetFile worked and the short test file was padded successfully with 250 bp.
When I tried my bed file I got an error:
12:31:29.271 INFO PadTargets - Done initializing engine
12:31:29.283 INFO TargetTableReader - Reading targets from file '/home/user/CNV/targets/hg38_TruSight_One_v1.1.bed.target' ...
12:31:29.692 INFO PadTargets - Shutting down engine
[08. Juli 2016 12:31:29 MESZ] org.broadinstitute.hellbender.tools.exome.PadTargets done. Elapsed time: 0,01 minutes.
Runtime.totalMemory()=253231104
java.lang.IllegalArgumentException: Invalid interval. Contig:chr1 start:146019875 end:146019703
at org.broadinstitute.hellbender.utils.SimpleInterval.validatePositions(SimpleInterval.java:61)
at org.broadinstitute.hellbender.utils.SimpleInterval.(SimpleInterval.java:36)
at org.broadinstitute.hellbender.tools.exome.TargetPadder.padTargets(TargetPadder.java:44)
at org.broadinstitute.hellbender.tools.exome.PadTargets.doWork(PadTargets.java:60)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:102)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:174)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:70)
at org.broadinstitute.hellbender.Main.main(Main.java:85)
When I look into my file at the position with the IllegalArgumentException: Invalid interval the start position is calculated correctly but the stop position is wrong (first line):
chr1 146020125 146020242 HFE2.chr1.145414781.145414878
chr1 146019165 146019745 HFE2.chr1.145415278.145415838
chr1 146018067 146018711 HFE2.chr1.145416312.145416936
@wagnerk
Hi,
Can you please submit some test files so we can debug locally? Instructions are here.
Thanks,
Sheila
Hi Shella,
I have uploaded the file hg38_TruSight_One_v1.1.bed.target to your server.
Klaus
Issue · Github
by Sheila
if i get paired tumor samples(normal and tumor),would i need to creat PoN file? and how i handle this?
i wonder:
1.you miss the "/" at the end of the "-Djava.library.path=/home/kong/Haiying/lib/hdf-java-2.9/hdfview/HDFView/lib"
2.the file merged.txt must contain more than 2 samples?
Hi @LeeTL1220
when i ran Step 3. Segment coverage profile, there always with the error bellow, do you know what happened?
Command Line: Rscript -e tempLibDir = '/tmp/fanghu/Rlib.8000983193739407403';source('/tmp/fanghu/CBS.5651025200789749973.R'); --args --sample_name=J1-A --targets_file=/Step2/normalized_coverage.tsv --output_file=/Step3/segment.tsv --log2_input=TRUE --min_width=2 --alpha=0.01 --nperm=10000 --pmethod=hybrid --kmax=25 --nmin=200 --eta=0.05 --trim=0.025 --undosplits=none --undoprune=0.05 --undoSD=3
Stdout:
Stderr: Error in getopt(spec = spec, opt = args) : long flag "args" is invalid
Calls: source ... withVisible -> eval -> eval -> parse_args -> getopt
Execution halted
I think this maybe helpful
https://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.17/bin/linux-centos7-x86_64-gcc485/
hdf5-1.8.17-linux-centos7-x86_64-gcc485-shared.tar.gz
sorry, I sent the wrong one.
below would be OK:
https://www.hdfgroup.org/ftp/HDF5/hdf-java/current/bin/
HDFView-2.11-centos6-x64.tar
@fanghu0104 > @fanghu0104 said:
I believe your error is due to an incorrect version of the optparse and/or getopt packages. Please use the "install_R_packages.R" Rscript to install these. The quoted code below is the relevant portions to your error:
Thank you, I changed the R version, R-3.1.3 or higher is OK.
@Haiying7 You can try running with
To see if that fixes it.
@Haiying7 The -TN parameter is simply the path to the file you would like to be created by this step of the workflow.
The step 1. for both these workflows is the same.
I believe step 2. of the create PoN workflow is where you specify the pairs.
@Haiying7 Apologies. You do not do anything with pairs for the Create PoN workflow. This is on normals only.
I think this workflow is not for paired samples. PoN(panel of normal) is created by all normal samples for once.
Hi @Haiying7,
The CreatePanelOfNormals tool performs a simple quality-control check of the input normal samples and removes those that fail the check (specifically, those that appear to contain large, arm-level events). By default, only those samples that pass the check are included in the final panel of normals that is output.
If you'd like to turn off this quality-control check and include these samples in the panel, you can use the -noQC option. However, this may affect the quality of the tangent normalization of case samples downstream.
@Haiying7
If you are potentially interested in differentiating somatic from germline CNVs then you can run the (PoN) excluded normal sample as a case sample and compare to the matched tumor sample. Keep in mind you should not run any normal samples against a PoN that includes those same normal samples.
@wagnerk
Hi Klaus,
Sorry for the delay. I am not able to reproduce your issue. Can you please send me your original file that you ran ConvertBedToTargetFile on? I am also confused why you are running PadTargets on your file if you already ran ConvertBedToTargetFile and got a padded file?
Thanks,
Sheila
@Haiying7 Normals run against a PoN that include those same samples will have meaningless results. You are correct in your description:
when i ran Step 0. there always with the error bellow, do you know what happened? Thank U!
$java -Xmx8G -jar /home/lizhenzhong/software/gatk4-latest/gatk-protected.jar PadTargets --targets /data/database/hg38_anno/hg38.refGene.merg.sort.filter.bed --output hg38.refGene.merg.sort.filter.padded.bed --padding 250
[August 17, 2016 11:42:19 AM CST] org.broadinstitute.hellbender.tools.exome.PadTargets --targets /data/database/hg38_anno/hg38.refGene.merg.sort.filter.bed --output hg38.refGene.merg.sort.filter.padded.bed --padding 250 --help false --version false --verbosity INFO --QUIET false
[August 17, 2016 11:42:19 AM CST] Executing as [email protected] on Linux 2.6.32-358.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_65-b17; Version: Version:version-unknown-SNAPSHOT
11:42:19.604 INFO PadTargets - Defaults.BUFFER_SIZE : 131072
11:42:19.606 INFO PadTargets - Defaults.COMPRESSION_LEVEL : 5
11:42:19.606 INFO PadTargets - Defaults.CREATE_INDEX : false
11:42:19.606 INFO PadTargets - Defaults.CREATE_MD5 : false
11:42:19.606 INFO PadTargets - Defaults.CUSTOM_READER_FACTORY :
11:42:19.606 INFO PadTargets - Defaults.EBI_REFERENCE_SEVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
11:42:19.606 INFO PadTargets - Defaults.INTEL_DEFLATER_SHARED_LIBRARY_PATH : null
11:42:19.606 INFO PadTargets - Defaults.NON_ZERO_BUFFER_SIZE : 131072
11:42:19.606 INFO PadTargets - Defaults.REFERENCE_FASTA : null
11:42:19.606 INFO PadTargets - Defaults.TRY_USE_INTEL_DEFLATER : true
11:42:19.606 INFO PadTargets - Defaults.USE_ASYNC_IO : false
11:42:19.606 INFO PadTargets - Defaults.USE_ASYNC_IO_FOR_SAMTOOLS : false
11:42:19.606 INFO PadTargets - Defaults.USE_ASYNC_IO_FOR_TRIBBLE : false
11:42:19.607 INFO PadTargets - Defaults.USE_CRAM_REF_DOWNLOAD : false
11:42:19.615 INFO PadTargets - Deflater JdkDeflater
11:42:19.615 INFO PadTargets - Initializing engine
11:42:19.615 INFO PadTargets - Done initializing engine
11:42:20.086 INFO FeatureManager - Using codec BEDCodec to read file /data/database/hg38_anno/hg38.refGene.merg.sort.filter.bed
11:42:20.086 INFO TargetUtils - Reading target intervals from exome file '/data/database/hg38_anno/hg38.refGene.merg.sort.filter.bed' ...
11:42:20.884 INFO PadTargets - Shutting down engine
[August 17, 2016 11:42:20 AM CST] org.broadinstitute.hellbender.tools.exome.PadTargets done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=444071936
java.lang.IllegalArgumentException: input intervals contain at least two overlapping intervals: [email protected] and [email protected]
at org.broadinstitute.hellbender.tools.exome.HashedListTargetCollection.checkForOverlaps(HashedListTargetCollection.java:79)
at org.broadinstitute.hellbender.tools.exome.HashedListTargetCollection.(HashedListTargetCollection.java:63)
at org.broadinstitute.hellbender.tools.exome.TargetCollectionUtils$2.(TargetCollectionUtils.java:66)
at org.broadinstitute.hellbender.tools.exome.TargetCollectionUtils.fromBEDFeatureList(TargetCollectionUtils.java:66)
at org.broadinstitute.hellbender.tools.exome.TargetCollectionUtils.fromBEDFeatureFile(TargetCollectionUtils.java:95)
at org.broadinstitute.hellbender.tools.exome.TargetUtils.readTargetFile(TargetUtils.java:42)
at org.broadinstitute.hellbender.tools.exome.PadTargets.doWork(PadTargets.java:54)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:102)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:174)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:69)
at org.broadinstitute.hellbender.Main.main(Main.java:84)
@Lizz It looks like your input bed file has overlapping intervals. This is not allowed as it will double-count reads. Please inspect your bed file and either remove overlapping targets or fix the start/end positions so that they do not overlap.
@aaronc, thank you! i get the bed file from the Gencode v24(exon region) for my WXS data, with filtering the overlapping. But it's still Error!
1. error message like :
[August 22, 2016 11:00:07 AM CST] org.broadinstitute.hellbender.tools.exome.PadTargets done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=660078592
java.lang.IllegalStateException: more than one interval in the input list results in the same name (SAMD11); perhaps repeated: '[email protected]' and '[email protected]'.
at org.broadinstitute.hellbender.tools.exome.HashedListTargetCollection.composeIntervalsByName(HashedListTargetCollection.java:104)
at org.broadinstitute.hellbender.tools.exome.HashedListTargetCollection.(HashedListTargetCollection.java:64)
chr1 924879 924879 SAMD11 . +
chr1 925149 925149 SAMD11 . +
chr1 925737 925737 SAMD11 . +
chr1 925921 925921 SAMD11 . +
this error seems that one genename could match only one region, so how to prepare my exon bed file? Thank u!
sorry about the wrong format bed file, it just like(still error) :
chr1 924879 924899 SAMD11 . +
chr1 925129 925149 SAMD11 . +
@Lizz Hi Lizz! To fix your bed file simply collapse any repeated region taking the first start position and the last end position:
becomes
Also you can drop the last column, so the bed file becomes
Hi,
Would you please help with a problem in making PON. I have made the "merged tsv of proportional coverage" successful for 10 normal samples. It looks good and no outlier values (sample value sum to 1, Min value 0, Q1 5.6e-05, Q3 1.02e-04, Max value 2.60e-03). Then I am going to make PON by:
java -Xmx32g -Djava.library.path=/HDFView/lib/linux/ -jar gatk-protected.jar CreatePanelOfNormals -I Normal.bed.All.coverage.tsv -O Normal.bed.All.coverage.tsv.PON
Part of the message:
16/09/26 16:09:56 INFO Executor: Running task 2.0 in stage 18.0 (TID 588)
16:09:56.300 INFO CoveragePoNQCUtils - Suspicious contig: Normal1 chr1 (1278.033311668069 -- 0)
16:09:56.300 INFO CoveragePoNQCUtils - Suspicious contig: Normal7 chr1 (-55.040334768011704 -- 0)
16:09:56.300 INFO CoveragePoNQCUtils - Suspicious contig: Normal5 chr1 (-34.0587811785343 -- 0)
16:09:56.300 INFO CoveragePoNQCUtils - Suspicious contig: Normal2 chr1 (-57.683738382363025 -- 0)
16:09:56.303 INFO CoveragePoNQCUtils - Suspicious contig: Normal8 chr1 (-54.96817475110036 -- 0)
16:09:56.303 INFO CoveragePoNQCUtils - Suspicious contig: Normal10 chr1 (1844.685903112766 -- 0)
16:09:56.304 INFO CoveragePoNQCUtils - Suspicious contig: Normal6 chr1 (-59.99297062144822 -- 0)
16:09:56.304 INFO CoveragePoNQCUtils - Suspicious contig: Normal3 chr1 (239.89156882761088 -- 0)
16/09/26 16:09:56 INFO Executor: Finished task 0.0 in stage 18.0 (TID 586). 913 bytes result sent to driver
16/09/26 16:09:56 INFO Executor: Finished task 2.0 in stage 18.0 (TID 588). 912 bytes result sent to driver
16/09/26 16:09:56 INFO TaskSetManager: Finished task 0.0 in stage 18.0 (TID 586) in 30 ms on localhost (1/4)
16:09:56.306 INFO CoveragePoNQCUtils - Suspicious contig: Normal9 chr1 (-32.831939726923004 -- 0)
16/09/26 16:09:56 INFO Executor: Finished task 3.0 in stage 18.0 (TID 589). 921 bytes result sent to driver
16/09/26 16:09:56 INFO TaskSetManager: Finished task 2.0 in stage 18.0 (TID 588) in 29 ms on localhost (2/4)
16:09:56.307 INFO CoveragePoNQCUtils - Suspicious contig: Normal4 chr1 (1520.5663308083222 -- 0)
16/09/26 16:09:56 INFO Executor: Finished task 1.0 in stage 18.0 (TID 587). 921 bytes result sent to driver
16/09/26 16:09:56 INFO TaskSetManager: Finished task 3.0 in stage 18.0 (TID 589) in 30 ms on localhost (3/4)
16/09/26 16:09:56 INFO TaskSetManager: Finished task 1.0 in stage 18.0 (TID 587) in 32 ms on localhost (4/4)
16/09/26 16:09:56 INFO DAGScheduler: ResultStage 18 (collect at CoveragePoNQCUtils.java:111) finished in 0.033 s
16/09/26 16:09:56 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks have all completed, from pool
16/09/26 16:09:56 INFO DAGScheduler: Job 16 finished: collect at CoveragePoNQCUtils.java:111, took 0.036466 s
16:09:56.311 INFO CreatePanelOfNormals - QC: Suspicious sample list created...
16:09:56.311 INFO CreatePanelOfNormals - Creating final PoN with 10 suspicious samples removed...
It seems that it removes all of the 10 normal samples due to "Suspicious contig". I can only add --noQC to make it running. Does anyone know what's wrong with it?
Thanks!
Another question, in Collect proportional coverage, I saw the parameter "-keepdups". Does it mean we should keep duplicate reads in bam? In best practice, it seems we need to remove duplicate (https://software.broadinstitute.org/gatk/best-practices/CNV.php). Thanks!
@shilin I would run with --noQC and -keepdups. I believe the PoN QC is an issue with creating a PoN of this size. By default -keepdups is enabled because it improves results.
Hi,
I am trying to generate the PoN and getting an error similar to those reported above, related to hdf5.
I saw some fixes in the repository made in the last few weeks so I pulled the most recent version of gatk-prtected and built the jar now but I still get the same error.
I'd appreciate any help, thanks.
This is the command line:
java -Xmx16g -Djava.library.path=/home/HDFView-2.13.0-Linux/HDFView/2.13.0/lib -jar /home/gatk4/gatk-protected.jar CreatePanelOfNormals -I /home/Noa/gatk4/mergedPcovFiles.output -O /home/Noa/gatk4/PoN.output
And this is the output with the error:
[September 29, 2016 9:31:59 AM EDT] org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals --input /home/Noa/gatk4/mergedPcovFiles.output --output /home/Noa/gatk4/PoN.output --minimumTargetFactorPercentileThreshold 25.0 --maximumColumnZerosPercentage 2.0 --maximumTargetZerosPercentage 5.0 --extremeColumnMedianCountPercentileThreshold 2.5 --truncatePercentileThreshold 0.1 --numberOfEigenSamples auto --noQC false --dryRun false --disableSpark false --sparkMaster local[*] --help false --version false --verbosity INFO --QUIET false
[September 29, 2016 9:31:59 AM EDT] Executing as [email protected] on Linux 2.6.32-279.14.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16; Version: Version:version-unknown-SNAPSHOT
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/29 09:32:00 INFO SparkContext: Running Spark version 1.5.0
16/09/29 09:32:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/29 09:32:02 INFO SecurityManager: Changing view acls to: henig
16/09/29 09:32:02 INFO SecurityManager: Changing modify acls to: henig
16/09/29 09:32:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(henig); users with modify permissions: Set(henig)
16/09/29 09:32:08 INFO Slf4jLogger: Slf4jLogger started
16/09/29 09:32:08 INFO Remoting: Starting remoting
16/09/29 09:32:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:47226]
16/09/29 09:32:08 INFO Utils: Successfully started service 'sparkDriver' on port 47226.
16/09/29 09:32:08 INFO SparkEnv: Registering MapOutputTracker
16/09/29 09:32:08 INFO SparkEnv: Registering BlockManagerMaster
16/09/29 09:32:08 INFO DiskBlockManager: Created local directory at /home/scratch/henig/blockmgr-6d77c841-821f-4198-9162-43b8a67a726d
16/09/29 09:32:08 INFO MemoryStore: MemoryStore started with capacity 7.7 GB
16/09/29 09:32:09 INFO HttpFileServer: HTTP File server directory is /home/scratch/henig/spark-518eec1a-b7f3-43ab-bee4-175f55fa02a9/httpd-e8228d6f-c0e8-4b40-a2a7-cac65e3f131d
16/09/29 09:32:09 INFO HttpServer: Starting HTTP Server
16/09/29 09:32:09 INFO Utils: Successfully started service 'HTTP file server' on port 46331.
16/09/29 09:32:09 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/29 09:32:09 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/29 09:32:09 INFO SparkUI: Started SparkUI at http://10.1.255.226:4040
16/09/29 09:32:09 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/09/29 09:32:09 INFO Executor: Starting executor ID driver on host localhost
16/09/29 09:32:09 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56081.
16/09/29 09:32:09 INFO NettyBlockTransferService: Server created on 56081
16/09/29 09:32:09 INFO BlockManagerMaster: Trying to register BlockManager
16/09/29 09:32:09 INFO BlockManagerMasterEndpoint: Registering block manager localhost:56081 with 7.7 GB RAM, BlockManagerId(driver, localhost, 56081)
16/09/29 09:32:09 INFO BlockManagerMaster: Registered BlockManager
16/09/29 09:32:13 INFO SparkUI: Stopped Spark web UI at http://10.1.255.226:4040
16/09/29 09:32:13 INFO DAGScheduler: Stopping DAGScheduler
16/09/29 09:32:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/09/29 09:32:14 INFO MemoryStore: MemoryStore cleared
16/09/29 09:32:14 INFO BlockManager: BlockManager stopped
16/09/29 09:32:21 INFO BlockManagerMaster: BlockManagerMaster stopped
16/09/29 09:32:21 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/09/29 09:32:21 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/09/29 09:32:21 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/09/29 09:32:21 INFO SparkContext: Successfully stopped SparkContext
[September 29, 2016 9:32:21 AM EDT] org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals done. Elapsed time: 0.37 minutes.
Runtime.totalMemory()=1192755200
Exception in thread "main" java.lang.UnsatisfiedLinkError: ncsa.hdf.hdf5lib.H5.H5dont_atexit()I
at ncsa.hdf.hdf5lib.H5.H5dont_atexit(Native Method)
at ncsa.hdf.hdf5lib.H5.loadH5Lib(H5.java:365)
at ncsa.hdf.hdf5lib.H5.(H5.java:274)
at ncsa.hdf.hdf5lib.HDF5Constants.(HDF5Constants.java:28)
at org.broadinstitute.hellbender.utils.hdf5.HDF5File$OpenMode.(HDF5File.java:505)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.writeTargetFactorNormalizeReadCountsAndTargetFactors(HDF5PoNCreator.java:185)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.createPoNGivenReadCountCollection(HDF5PoNCreator.java:118)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.createPoN(HDF5PoNCreator.java:88)
at org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals.runPipeline(CreatePanelOfNormals.java:244)
at org.broadinstitute.hellbender.utils.SparkToggleCommandLineProgram.doWork(SparkToggleCommandLineProgram.java:39)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:102)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:174)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:69)
at org.broadinstitute.hellbender.Main.main(Main.java:84)
16/09/29 09:32:22 INFO ShutdownHookManager: Shutdown hook called
16/09/29 09:32:22 INFO ShutdownHookManager: Deleting directory /home/scratch/henig/spark-518eec1a-b7f3-43ab-bee4-175f55fa02a9
16/09/29 09:32:22 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Thanks for your reply! Another question, I have successfully generated the result and there is some called segments. But in step4, the background (X and Y axis, titles, background with color, dashed lines for each chromosome, ...) was generated successful. But there was no dots/lines for CNVs in the figure. Is it a bug of the program?
Thanks!
Update for my previous message:
The java.lang.UnsatisfiedLinkError: ncsa.hdf.hdf5lib.H5.H5dont_atexit is solved but I have another exception coming up now:
These are the last lines out of 3000, the same command line as above:
java -Xmx16g -Djava.library.path=/home/HDFView-2.13.0-Linux/HDFView/2.13.0/lib -jar /home/gatk4/gatk-protected.jar CreatePanelOfNormals -I /home/Noa/gatk4/mergedPcovFiles.output -O /home/Noa/gatk4/PoN.output
End of output:
16/09/29 13:38:43 INFO DAGScheduler: Job 18 finished: collect at CoveragePoNQCUtils.java:111, took 0.049464 s
16/09/29 13:38:43 INFO SparkUI: Stopped Spark web UI at http://10.1.255.220:4040
16/09/29 13:38:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/09/29 13:38:44 INFO MemoryStore: MemoryStore cleared
16/09/29 13:38:44 INFO BlockManager: BlockManager stopped
16/09/29 13:38:44 INFO BlockManagerMaster: BlockManagerMaster stopped
16/09/29 13:38:44 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/09/29 13:38:44 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/09/29 13:38:44 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/09/29 13:38:44 INFO SparkContext: Successfully stopped SparkContext
[September 29, 2016 1:38:44 PM EDT] org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals done. Elapsed time: 0.25 minutes.
Runtime.totalMemory()=2780823552
java.lang.IllegalArgumentException: the number of columns to keep must be greater than 0
at org.broadinstitute.hellbender.tools.exome.ReadCountCollection.subsetColumns(ReadCountCollection.java:266)
at org.broadinstitute.hellbender.tools.pon.coverage.pca.HDF5PCACoveragePoNCreationUtils.create(HDF5PCACoveragePoNCreationUtils.java:91)
at org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals.runPipeline(CreatePanelOfNormals.java:264)
at org.broadinstitute.hellbender.utils.SparkToggleCommandLineProgram.doWork(SparkToggleCommandLineProgram.java:39)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:108)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:166)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:76)
at org.broadinstitute.hellbender.Main.main(Main.java:92)
16/09/29 13:38:44 INFO ShutdownHookManager: Shutdown hook called
16/09/29 13:38:44 INFO ShutdownHookManager: Deleting directory /home/scratch/henig/spark-998ca17d-cb96-4fe2-aa82-cab1ad0b666f
16/09/29 13:38:44 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
@aaronc or @Sheila I would be happy if you could help, thanks.
Noa
@shilin Can you confirm which version/release you are running? The latest release (alpha1.2.3) may resolve the plotting issue you are seeing. Also, note that the plotting only supports hg19.
@noa How many samples are in your PoN? In the error output, do you see any warnings about samples being dropped (see the post by @shilin) It's possible that creating a PoN with a smallish number of samples (~tens) can give poor results for the quality-control checking routine, causing all samples to be dropped. Can you try running with -noQC?
Also note that with the latest release (alpha1.2.3), including "-Djava.library.path=/home/HDFView-2.13.0-Linux/HDFView/2.13.0/lib" should not be necessary. Also note that an improved method for QC checking will be released soon---the current method is relatively naive and tries to perform a heuristic check for large, arm-level events.
Hi, Can I cite CNV caller from GATK4 for publication? If not what do you recommend to call CNVs for somatic variations? is XHMM fine?
Many thanks, Rahel
Other than this discussion post, there is also a technical whitepaper at https://github.com/broadinstitute/gatk-protected/blob/master/docs/CNVs/CNV-methods.pdf. However, note that only some sections are relevant for GATK CNV and that this document will continue to be updated. So if you'd like to cite it in some regard, you may want to link to the specific GitHub commit for the release you are using (e.g., https://github.com/broadinstitute/gatk-protected/blob/1.0.0.0-alpha1.2.3/docs/CNVs/CNV-methods.pdf would be the appropriate link if you were using alpha1.2.3).
There will be publications forthcoming for both CNV and ACNV, but these will not be ready for a few months and will be based on significantly updated versions of the tools.
PlotSegmentedCopyRatio plots have only the axes and chromosomes without points, any idea why?
@amjad,
You'll have to give us more information for us to begin answering your question.
@shlee It is the same problem seen by @shilin and answered by @slee
I tried the latest version (alpha1.2.3) without success and my reference genome is hg19.
Hi @amjad,
Thanks for the clarification. If this version of the tool supports only one reference genome, then that reference is GRCh37 and not hg19. One major difference is that GRCh37 contigs do NOT start with chr whereas hg19 contigs do start with chr. The original documentation by LeeTL1220 shows example data from GRCh37. Some folks call this reference hg37 and others use it interchangeably with hg19 given it is based on the same assembly. Only plotting is constrained in this way and all of your other results should be fine. I believe a newer version of the plotting tool will accommodate any reference. In the meanwhile, I see two quick workarounds. One is to remove the chr prefixes from the data you are trying to plot. If you do this, then be sure the data is sorted by the contig order that GRCh37 would be sorted by and only represents contigs present in GRCh37. That is, you'll have to remove data for extraneous contigs. Second, you could try to visualize your proportional coverage using other means, e.g. R (or RStudio) or IGV. If you try IGV, then I believe you will have to center your normal coverage to either 0 or 1, whichever the CNV data is not centered upon. BTW, IGV's visualization is by heatmap coloring.
Thank you @shlee. It worked after removing the chr characters and sorting the chromosomes
Hi ! @LeeTL1220 . I need your help . While running 'CreatePanelOfNormals' , some errors occured. Here is the code i ran 'java -Xmx16g -Djava.library.path=/Workspace/Software/HDFView/HDFView-2.13.0-Linux/HDFView/2.13.0/lib/ -jar /Workspace/Software/gatk4-latest/gatk-protected.jar reatePanelOfNormals -I merge.txt -O PoN.tsv '. And the log file as following :
[January 9, 2017 6:11:58 PM CST] org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals --input merge.txt --output PoN.tsv --minimumTargetFactorPercentileThreshold 25.0 --maximumColumnZerosPercentage 2.0 --maximumTargetZerosPercentage 5.0 --extremeColumnMedianCountPercentileThreshold 2.5 --truncatePercentileThreshold 0.1 --numberOfEigenSamples auto --noQC false --dryRun false --disableSpark false --sparkMaster local[*] --help false --version false --verbosity INFO --QUIET false
[January 9, 2017 6:11:58 PM CST] Executing as [email protected] on Linux 3.10.0-514.2.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b15; Version: Version:version-unknown-SNAPSHOT
18:11:58.749 INFO CreatePanelOfNormals - Defaults.BUFFER_SIZE : 131072
18:11:58.750 INFO CreatePanelOfNormals - Defaults.COMPRESSION_LEVEL : 5
18:11:58.750 INFO CreatePanelOfNormals - Defaults.CREATE_INDEX : false
18:11:58.750 INFO CreatePanelOfNormals - Defaults.CREATE_MD5 : false
18:11:58.750 INFO CreatePanelOfNormals - Defaults.CUSTOM_READER_FACTORY :
18:11:58.750 INFO CreatePanelOfNormals - Defaults.EBI_REFERENCE_SEVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
18:11:58.750 INFO CreatePanelOfNormals - Defaults.INTEL_DEFLATER_SHARED_LIBRARY_PATH : null
18:11:58.750 INFO CreatePanelOfNormals - Defaults.NON_ZERO_BUFFER_SIZE : 131072
18:11:58.751 INFO CreatePanelOfNormals - Defaults.REFERENCE_FASTA : null
18:11:58.751 INFO CreatePanelOfNormals - Defaults.TRY_USE_INTEL_DEFLATER : true
18:11:58.751 INFO CreatePanelOfNormals - Defaults.USE_ASYNC_IO : false
18:11:58.751 INFO CreatePanelOfNormals - Defaults.USE_ASYNC_IO_FOR_SAMTOOLS : false
18:11:58.751 INFO CreatePanelOfNormals - Defaults.USE_ASYNC_IO_FOR_TRIBBLE : false
18:11:58.751 INFO CreatePanelOfNormals - Defaults.USE_CRAM_REF_DOWNLOAD : false
18:11:58.752 INFO CreatePanelOfNormals - Deflater JdkDeflater
18:11:58.752 INFO CreatePanelOfNormals - Initializing engine
18:11:58.753 INFO CreatePanelOfNormals - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/09 18:11:59 INFO SparkContext: Running Spark version 1.5.0
17/01/09 18:11:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/09 18:11:59 INFO SecurityManager: Changing view acls to: yangjiatao
17/01/09 18:11:59 INFO SecurityManager: Changing modify acls to: yangjiatao
17/01/09 18:11:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yangjiatao); users with modify permissions: Set(yangjiatao)
17/01/09 18:11:59 INFO Slf4jLogger: Slf4jLogger started
17/01/09 18:11:59 INFO Remoting: Starting remoting
17/01/09 18:11:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:46229]
17/01/09 18:11:59 INFO Utils: Successfully started service 'sparkDriver' on port 46229.
17/01/09 18:11:59 INFO SparkEnv: Registering MapOutputTracker
17/01/09 18:11:59 INFO SparkEnv: Registering BlockManagerMaster
17/01/09 18:12:00 INFO DiskBlockManager: Created local directory at /tmp/yangjiatao/blockmgr-60a33e19-35f0-4969-8ba4-eb844485d298
17/01/09 18:12:00 INFO MemoryStore: MemoryStore started with capacity 7.7 GB
17/01/09 18:12:00 INFO HttpFileServer: HTTP File server directory is /tmp/yangjiatao/spark-9dd82bfd-6256-43a9-a5c0-b8b831b0bc21/httpd-927e1211-e7c3-42e8-b528-786f806ed19b
17/01/09 18:12:00 INFO HttpServer: Starting HTTP Server
17/01/09 18:12:00 INFO Utils: Successfully started service 'HTTP file server' on port 35578.
17/01/09 18:12:00 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/09 18:12:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/01/09 18:12:00 INFO SparkUI: Started SparkUI at http://192.168.0.131:4040
17/01/09 18:12:00 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
17/01/09 18:12:00 INFO Executor: Starting executor ID driver on host localhost
17/01/09 18:12:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37751.
17/01/09 18:12:00 INFO NettyBlockTransferService: Server created on 37751
17/01/09 18:12:00 INFO BlockManagerMaster: Trying to register BlockManager
17/01/09 18:12:00 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37751 with 7.7 GB RAM, BlockManagerId(driver, localhost, 37751)
17/01/09 18:12:00 INFO BlockManagerMaster: Registered BlockManager
18:12:01.177 INFO CreatePanelOfNormals - QC: Beginning creation of QC PoN...
18:12:01.235 INFO HDF5PoNCreator - All 283 targets are kept
17/01/09 18:12:01 INFO SparkUI: Stopped Spark web UI at http://192.168.0.131:4040
17/01/09 18:12:01 INFO DAGScheduler: Stopping DAGScheduler
17/01/09 18:12:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/01/09 18:12:01 INFO MemoryStore: MemoryStore cleared
17/01/09 18:12:01 INFO BlockManager: BlockManager stopped
17/01/09 18:12:01 INFO BlockManagerMaster: BlockManagerMaster stopped
17/01/09 18:12:01 INFO SparkContext: Successfully stopped SparkContext
18:12:01.673 INFO CreatePanelOfNormals - Shutting down engine
17/01/09 18:12:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[January 9, 2017 6:12:01 PM CST] org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2264399872
Exception in thread "main" java.lang.UnsatisfiedLinkError: ncsa.hdf.hdf5lib.H5.H5dont_atexit()I
at ncsa.hdf.hdf5lib.H5.H5dont_atexit(Native Method)
at ncsa.hdf.hdf5lib.H5.loadH5Lib(H5.java:365)
at ncsa.hdf.hdf5lib.H5.(H5.java:274)
at ncsa.hdf.hdf5lib.HDF5Constants.(HDF5Constants.java:28)
at org.broadinstitute.hellbender.utils.hdf5.HDF5File$OpenMode.(HDF5File.java:505)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.writeTargetFactorNormalizeReadCountsAndTargetFactors(HDF5PoNCreator.java:185)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.createPoNGivenReadCountCollection(HDF5PoNCreator.java:118)
at org.broadinstitute.hellbender.utils.hdf5.HDF5PoNCreator.createPoN(HDF5PoNCreator.java:88)
at org.broadinstitute.hellbender.tools.exome.CreatePanelOfNormals.runPipeline(CreatePanelOfNormals.java:244)
at org.broadinstitute.hellbender.utils.SparkToggleCommandLineProgram.doWork(SparkToggleCommandLineProgram.java:39)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:102)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:174)
at org.broadinstitute.hellbender.Main.instanceMain(Main.java:69)
at org.broadinstitute.hellbender.Main.main(Main.java:84)
17/01/09 18:12:01 INFO ShutdownHookManager: Shutdown hook called
17/01/09 18:12:01 INFO ShutdownHookManager: Deleting directory /tmp/yangjiatao/spark-9dd82bfd-6256-43a9-a5c0-b8b831b0bc21
@LeeTL1220 , I would be happy if you could help, thanks.
Hi @Yujian,
Could I ask which version of gatk-protected (i.e., commit hash or release number) you are using? For jars built from either the latest commit (d9fa681) or the latest release (alpha1.2.3), you no longer have to specify "-Djava.library.path=/Workspace/Software/HDFView/HDFView-2.13.0-Linux/HDFView/2.13.0/lib/". Using these jars, I was unable to reproduce the exception you encountered when running CreatePanelOfNormals.
For a newer tutorial using GATK4's v1.0.0.0-alpha1.2.3 release (Version:0288cff-SNAPSHOT from September 2016), see this Somatic_CNV worksheet and this data bundle. If you have a question on the Somatic_CNV_handson tutorial, please post it as a new question using this form.
As of March 14, 2017, I've made the tutorial worksheet a forum article. It is Article#9143.
Hi @amjad i just have the same issue, but i am having trouble sorting the chromosomes, can i ask you how you did it? it will really help me a lot.
Thank you!
Hi, I was wondering if you had a workflow for germline CNV/aCNV calling on WES? Thanks so much!
Hi @llau,
Germline CNV is under active development currently. Please stay tuned.
Hi there... I have a question about the number of EigenSamples. By default, a Jollife’s factor of 0.7 is used. Can someone tell me why this is necessary? Is it purely a computational issue, or there is underlying reason behind this reduction step. Thanks in advance.
@zhipan Hi Zhipan,
A factor of 0.7 is somewhat arbitrary, but we cut the eigensamples that contribute a low amount of variance because they are more likely to contain rare germine CNVs which can silence real events in case samples.
@aaronc Thanks a lot.
Hi all,
please help me with this problem:
http://gatkforums.broadinstitute.org/gatk/discussion/9005/error-while-running-gatk4-cnv-workflow-something-with-the-hdf5-lib#latest
Thanks a lot!
Jia
This problem has been solved. Thanks to @EADG !
But now I have another problem while plotting the results. I can get all the output files from computing steps. They all look fine. However, the read dots are missing in the resulting PNG files. I can only see the axes and labels.
This is the command i used:
java -jar /home/jxue/softwares/GATK4_CNV/gatk4.jar PlotSegmentedCopyRatio -TN E08055T.tn.tsv -PTN E08055T.ptn.tsv -S E08055T.seg -O . -pre E08055 -LOG -schr
Thanks a lot in advance!
Jia
Hi @xuejia,
Please see some of the previous posts in this thread. Empty plots may be produced by previous releases if you are not using hg19---is this the case for your data? The latest release (alpha1.2.4) resolves this issue by taking a sequence-dictionary (.dict) file to determine the regions to be plotted, so you may want to try using that release instead.
Hi @Haiying7 ,
how did you solve the problem about "Exception in thread "main" java.lang.UnsatisfiedLinkError: ncsa.hdf.hdf5lib.H5.H5dont_atexit()I".
Hi! It is possible to run the CNV workflow in germline CNV calling on whole genome data? If so, there is a minimum number of samples needed to run it?
Lots of thanks!
There will be a separate workflow/tool (to be released shortly, timescale on the order of weeks) for calling germline CNVs from both WES and WGS. If you are interested in the details, a poster that the primary developer of the tool presented at AACR can be seen at http://genomicinfo.broadinstitute.org/acton/attachment/13431/f-0186/1/-/-/-/-/AACRPoster_MB.pdf?sid=TV2:isKk4hPeO; a recent talk can be viewed at https://www.broadinstitute.org/videos/scalable-bayesian-model-copy-number-variation-bayesian-pca.
Hi Slee, is there an approximate release date for this germline CNV tool? And is there a development version available to test in the mean time?
Hi @maelygauthier, this should be available shortly. Alternatively, you can build your own jar from the gatk-protected repo. The tool is experimental and called GermlineCNVCaller.
Thanks @shlee,
I built my own jar as recommended and testing the GermlineCNVCaller now. I found the annotation table and the transition prior table on your resource bundle page (ftp://ftp.broadinstitute.org/bundle/beta/GermlineCNVCaller/). I was just unsure about how to derive the input required (i.e. the Combined read count collection URI). Do I need to use the CalculateTargetCoverage tool and feed the interval bed file for all the targetted regions of my whole exome panel, or derive read counts at the chromosome level here?
./gatk-launch GermlineCNVCaller --contigAnnotationsTable ./grch37_contig_annotations.tsv --copyNumberTransitionPriorTable ./grch37_germline_CN_priors.tsv --jobType LEARN_AND_CALL --outputPath ./trial --input ?
Hi @maelygauthier,
We've recently merged the gatk-protected repo with the GATK4 repo and I have updated the documentation for GermlineCNVCaller in the newly merged repo with example commands: https://github.com/broadinstitute/gatk/blob/c58d750be88f2fddc3272a45bce447c477f68cbb/src/main/java/org/broadinstitute/hellbender/tools/coveragemodel/germline/GermlineCNVCaller.java
Be sure to use the latest jar from the gatk repo (in beta status), especially for experimental (alpha status) tools like GermlineCNVCaller that are being actively developed.
The input is described as:
The input can be the results of CorrectGCBias, SparkGenomeReadCounts or CalculateTargetCoverage. For CalculateTargetCoverage, you may have run across some example commands that use the
--transform PCOV
option for proportional coverage. Remember that the other option (the default) is to output RAW counts.We'll be writing up workflows first in WDL format then for the forum on an ongoing basis for the new tools going forward.
Thanks @shlee for the update,
I used the latest jar from the gatk4 repo as recommended. And managed to derive the read count input file and sex genotype table. I just wanted to confirm whether Nd4j also needed to be installed if not using Spark.
script run
./gatk-launch GermlineCNVCaller --contigAnnotationsTable ../gatk4_Hellbender/grch37_contig_annotations.tsv --copyNumberTransitionPriorTable ../gatk4_Hellbender/grch37_germline_CN_priors.tsv --jobType LEARN_AND_CALL --outputPath ./TS1 --input ../gatk4_Hellbender/target_cov.tsv --targets ../gatk4_Hellbender/targets.txt --disableSpark true --sexGenotypeTable ../gatk4_Hellbender/TS1_genotype --rddCheckpointing false --biasCovariateSolverType LOCAL
I am getting the following error which seems to be linked with Nd4j:
Using GATK jar ~/localwork/playground/programs/gatk-protected/build/libs/gatk-protected-package-b4390fb-SNAPSHOT-local.jar
102-b14; Version: 4.alpha.2-1136-gc18e780-SNAPSHOT
16:55:21.931 INFO GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
16:55:21.932 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:55:21.932 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:55:21.932 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:55:21.932 INFO GermlineCNVCaller - Deflater: IntelDeflater
16:55:21.932 INFO GermlineCNVCaller - Inflater: IntelInflater
16:55:21.932 INFO GermlineCNVCaller - Initializing engine
16:55:21.932 INFO GermlineCNVCaller - Done initializing engine
16:55:21.933 INFO GermlineCNVCaller - Spark disabled. sparkMaster option (local[*]) ignored.
16:55:23.448 INFO GermlineCNVCaller - Parsing the read counts table...
16:55:24.876 INFO GermlineCNVCaller - Parsing the sample sex genotypes table...
16:55:24.896 INFO GermlineCNVCaller - Parsing the germline contig ploidy annotation table...
16:55:24.906 INFO ContigGermlinePloidyAnnotationTableReader - Ploidy tags: SEX_XX, SEX_XY
16:55:25.056 INFO GermlineCNVCaller - Parsing the copy number transition prior table and initializing the caches...
16:55:28.634 INFO GermlineCNVCaller - Initializing the EM algorithm workspace...
16:55:32.861 INFO GermlineCNVCaller - Shutting down engine
[June 12, 2017 4:55:32 PM ACST] org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=1364721664
org.broadinstitute.hellbender.exceptions.GATKException: Nd4j data type must be set to double for coverage modeller routines to function properly. This can be done by setting JVM system property "dtype" to "double". Can not continue.
Thanks
Issue · Github
by shlee
Hi Maely,
Nd4j is the linear algebra backend that we use in GATK4 and is already included in the jar file. You need to set the data type to double-precision floating point for Nd4j to behave properly. This is done by passing an additional JVM argument to gatk-launch
--javaOptions '-Ddtype=double'
. For your case:We will be releasing WDL scripts for running GermlineCNVCaller with the official GATK4 beta release. The scripts will provide further example use cases of GermlineCNVCaller.
Best,
Mehrtash
Hi @maelygauthier,
I think you only need to set the Nd4j data type based on the following message from the stacktrace:
I think you can set this with the java options, e.g.:
Let me know if this works or not. I'll see if we can have this option set automatically for the tool going forward.
@shlee and @Mehrtash, thanks both for your speedy feedback, this fixed the issue. Thanks a lot.
it's the version problem, already solved.
Thanks!