Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

(How to) Call somatic copy number variants using GATK4 CNV

2»

Comments

  • SheilaSheila Broad InstituteMember, Broadie admin

    @zhengzha2000
    Hi,

    I will check with the team and get back to you.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @zhengzha2000 Can you clarify where you saw a reference to the tool you're asking about? I'm not familiar with it and it's not referenced anywhere in this discussion. Are you sure of the spelling?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @LanC,

    Are you saying the normals are rather different from each other or that the normals are different from your tumor cell line's parental strain?

    The somatic CNV PoN will remove outlier information and also try to average the information in the normals you give it. Your experimental design:

    call somatic CNV for Mouse tumor cell line data with matched normal strain samples. The genetic background of these normal strain samples actually are much different.

    should give you CNV events that differ between the tumor and PoN normals. You may have to adjust settings. I think if the normals are wildly different from each other for certain regions, these regions may become unusable.

    Could you please describe in more detail (or point us to some publications) the extent of the CNV differences in mouse strains? It would be great for our team to be aware of such intra-species differences so workflows can (ideally) have options that enable such research.

    We are actually in the process of developing the exact functionality you are asking for that will allow you to compare just a matched pair.

    Otherwise, we hope to have the repackaged somatic CNV workflow available at the earliest end of year.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited October 2017

    Hi @zhengzha2000,

    CallCNLoHAndSplits was in the GATK4-alpha release.

    We've moved on to the beta release and the tools and workflow differ. First, you run somatic CNV (as above), then run a separate ACNV workflow, which requires a matched Normal BAM.

    In the near future, we will be integrating these two workflows for more modularity and so the tool names will likely change yet again.

    In the meantime, the GATK4-beta tools of interest to you are VcfToIntervalList, GetHetCoverage OR GetBayesianHetCoverage, AllelicCNV and PlotACNVResults.

    I wrote up a mini-tutorial for the September (2018) Helsinki workshop that outlines example commands of the ACNV workflow. You can find a link to workshop materials at https://software.broadinstitute.org/gatk/documentation/presentations. The folder you are interested in is dated 1709. The mini-tutorial worksheet is in the data bundle GATK4_AllelicCNV.tar.gz and has one glaring omission. The AllelicCNV command is missing the --useAllCopyRatioSegments parameter. Also, if you need to iterate over parameters, for faster runs, you can tweak options in AllelicCNV, e.g. :

    --maxNumIterationsSimSeg 2 \
    --maxNumIterationsSNPSeg 2 \
    --numIterationsSimSegPerFit 2
    

    P.S. As I mention to @LanC above, you could try out the development version of the CNV workflow using sl_wgs_acnv. The commands are outlined in the WDL script.

  • @shlee:

    Thanks for your answer. I will pay attention to next release and look into the presentation you mentioned.

    Zheng

  • johnhejohnhe Icahn School of Medicine Member

    Hi,

    I don't know if anyone else had this issue, but I had to update the getoptURL and optparseURL in the install_R_packages.R script linked from this webpage before PerformSegmentation would work for me. Thought I would just share just in case anyone else ran into the same problem. Apologies if this has been discussed before.

    Best,

    John

  • SheilaSheila Broad InstituteMember, Broadie admin

    @johnhe
    Hi John,

    Thanks for sharing.

    -Sheila

  • Hi @shlee

    Many thanks for your answer and sorry for my late response. Yes what I mean is that the genetic background of the normal mouse strains are much different from each other. I agree with you that: “if the normals are wildly different from each other for certain regions, these regions may become unusable.

    Sorry that I have not run the germline CNV identification on these normal strains so that have no accurate estimation of the CNV differences. However I searched one paper that investigated CNV on 351 mouse samples including many classical laboratory strains that we uses. Hope it is helpful.

    "Locke, M. Elizabeth O., et al. "Genomic copy number variation in Mus musculus." BMC genomics 16.1 (2015): 497."

    Many Thanks!

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @LanC,

    Thank you so much for the journal article. I am printing it right now to peruse. I think it will be of great interest to our developers as well. Thanks again and happy holidays.

  • micknudsenmicknudsen DenmarkMember ✭✭

    Are there any plans to update this tutorial? The tool CalculateTargetCoverage is missing in the final GATK4 release.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @micknudsen,

    I'm just getting to the CNV tutorial now. I've just finished generating the tutorial data using the WDL scripts available in the broadinstitute/gatk repository with the new tools and am turning to writing about each step. The tools and their features have changed. For example, instead of the deprecated CalculateTargetCoverage, you will want to use CollectFragmentCounts (link to tooldoc) which is categorized under 'Coverage Analysis'.

    I believe the tooldocs themselves are fairly comprehensive in explaining the workflow. Together with the pipeline scripts (written in WDL), you should be able to piece together the workflow.

    If you are very eager to test out the new workflow, and absolutely need the tutorial data and example commands, I can make the data bundle and share it and also share a live draft of the tutorial as a Google Doc. In return, I would ask the favor of feedback. = )

  • micknudsenmicknudsen DenmarkMember ✭✭

    Hi @shlee,

    I am currently in the process of untangling the WDL-workflows, and things are going well so far. If I run into troubles, I may ask for your tutorial draft (and happily provide feedback, of course).

    Thanks,
    Michael

  • micknudsenmicknudsen DenmarkMember ✭✭

    Can I see your draft? :)

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Sure @micknudsen.

    https://docs.google.com/document/d/1AX-GDYb2HJB0I-wS_XJjQDfjeOaynu39dzG6SsaeSwo/edit?usp=sharing

    It is just a skeleton of commands and is woefully incomplete. I'm not sure how helpful it will be. I'll update this particular Google Doc draft sporadically.

    I also think what might be helpful to you is an input JSONs file for the WDL scripts you are studying. Most of the optional parameters need not be specified when you are starting out as you would be using default parameters. Since currently there are no example JSONs available (soon there should be one at https://github.com/gatk-workflows/gatk4-somatic-cnvs), I can share with you the ones I made to generate the tutorial data using GATK4.0.1.1. Here is for the cnv_somatic_pair_workflow:

    {
      "##_COMMENT1:": "WORKFLOW STEP OPTIONS",
      "CNVSomaticPairWorkflow.gatk_docker": "broadinstitute/gatk:4.0.1.1",
      "CNVSomaticPairWorkflow.gatk4_jar_override": "/home/shlee/gatk-4.0.1.1/gatk-package-4.0.1.1-local.jar",
      "CNVSomaticPairWorkflow.is_run_oncotator": "False",
      "CNVSomaticPairWorkflow.oncotator_docker": "broadinstitute/oncotator:1.9.3.0",
    
      "##_COMMENT2:": "DATA",
      "CNVSomaticPairWorkflow.ref_fasta": "/home/shlee/Documents/ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa",
      "CNVSomaticPairWorkflow.ref_fasta_fai": "/home/shlee/Documents/ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa.fai",
      "CNVSomaticPairWorkflow.ref_fasta_dict": "/home/shlee/Documents/ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.dict",
    
      "CNVSomaticPairWorkflow.read_count_pon": "/home/shlee/Documents/cnv_180207/cnvponC.pon.hdf5",
      "CNVSomaticPairWorkflow.intervals": "/home/shlee/Documents/cnv_180207/intervals/targets_C.interval_list",
    
      "CNVSomaticPairWorkflow.common_sites": "/home/shlee/Documents/cnv_180207/intervals/theta_biallelicsnps_agilentintervals.interval_list",
      "CNVSomaticPairWorkflow.tumor_bam": "/home/shlee/Documents/hcc/hcc1143_T_clean.bam",
      "CNVSomaticPairWorkflow.tumor_bam_idx": "/home/shlee/Documents/hcc/hcc1143_T_clean.bai",
      "CNVSomaticPairWorkflow.normal_bam": "/home/shlee/Documents/hcc/hcc1143_N_clean.bam",
      "CNVSomaticPairWorkflow.normal_bam_idx": "/home/shlee/Documents/hcc/hcc1143_N_clean.bai",
    
      "##_COMMENT3:": "ANALYSIS PARAMETERS",
    
      "##_(optional)_bin_length_default_of_1000_is_appropriate_for_WGS,_set_to_0_for_WES": "",
      "CNVSomaticPairWorkflow.bin_length": "0"
    }
    
  • micknudsenmicknudsen DenmarkMember ✭✭
  • @shlee - thanks for the sample JSON file. I do have a question for "CNVSomaticPairWorkflow.common_sites". Is there any tutorial which explains how to make common sites file for the analysis. I followed the tutorial explained here. Unfortunately, the code produces the interval files but with no variants. Can you provide your file? Thanks.

    Issue · Github
    by shlee

    Issue Number
    2965
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @tushardave26, the common_sites file I prepped for use in the tutorial is appropriate for the exome capture data I am using, mapped to GRCh38. You'll want to make your own for the type of coverage you expect and the reference you are using. You can start with the Mutect2 germline af resource available in the GATK bundle and subset common biallelic variants with SelectVariants. You can additionally apply an intervals list to subset the variants further.

  • pdupdu Member

    Hi,

    I'm trying to use GATK-4.0.1.2 to call SCNVs using in vitro murine exome data. Since our exome data is from in vitro cultured cells, I expect that the noise level compared to in vivo tumors to be much lower. Therefore, I have not used a panel of normals in my analysis and have instead opted to compare only to a matched normal tissue sample.

    I have been using the workflow in the google doc that @shlee posted above:
    1. CollectFragmentCounts for tumor and normal
    2. CollectAllelicCounts for tumor and normal
    3. DenoiseReadCounts for CollectFragmentCounts output
    4. ModelSegments with tumor allelic counts, tumor denoised copy ratios, and normal allelic counts

    I'm running into some trouble using ModelSegments. The program runs to completion, but in the error logs, it appears that the models are not a good fit for the data.

    When it tries to fit the allele-fraction model, the model log likelihood hovers around -430000, without much change after several iterations. Eventually, the modeler exits because the maximum-likelihood estimate approaches 0.5, and the program recommends that I change the parameters for filtering homozygous sites. Here is an example error output for the allele-fraction modeler:

    11:37:59.920 INFO  MultidimensionalModeller - Fitting allele-fraction model...
    11:49:33.812 INFO  AlleleFractionInitializer - Initializing allele-fraction model, iterating until log likelihood converges to within 0.500000...
    11:49:34.098 INFO  AlleleFractionInitializer - Iteration 1, model log likelihood = -430641.392238...
    11:49:34.111 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.038076, biasVariance=0.337363, outlierProbability=0.147882}
    11:49:34.276 INFO  AlleleFractionInitializer - Iteration 2, model log likelihood = -430490.985227...
    11:49:34.276 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.256389, biasVariance=0.210133, outlierProbability=0.147882}
    11:49:34.404 INFO  AlleleFractionInitializer - Iteration 3, model log likelihood = -430476.106619...
    11:49:34.404 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.305350, biasVariance=0.243663, outlierProbability=0.147882}
    11:49:34.535 INFO  AlleleFractionInitializer - Iteration 4, model log likelihood = -430466.256679...
    11:49:34.535 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.347862, biasVariance=0.271174, outlierProbability=0.147882}
    11:49:34.668 INFO  AlleleFractionInitializer - Iteration 5, model log likelihood = -430458.344358...
    11:49:34.669 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.383601, biasVariance=0.301706, outlierProbability=0.147882}
    11:49:34.796 INFO  AlleleFractionInitializer - Iteration 6, model log likelihood = -430450.787700...
    11:49:34.796 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.420590, biasVariance=0.332392, outlierProbability=0.147882}
    11:49:34.922 INFO  AlleleFractionInitializer - Iteration 7, model log likelihood = -430443.858371...
    11:49:34.922 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.458092, biasVariance=0.363497, outlierProbability=0.147882}
    11:49:35.051 INFO  AlleleFractionInitializer - Iteration 8, model log likelihood = -430437.980551...
    11:49:35.051 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.494063, biasVariance=0.393534, outlierProbability=0.147882}
    11:49:35.181 INFO  AlleleFractionInitializer - Iteration 9, model log likelihood = -430432.981572...
    11:49:35.181 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.527987, biasVariance=0.423246, outlierProbability=0.147882}
    11:49:35.313 INFO  AlleleFractionInitializer - Iteration 10, model log likelihood = -430428.734241...
    11:49:35.313 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.559930, biasVariance=0.452381, outlierProbability=0.147882}
    11:49:35.449 INFO  AlleleFractionInitializer - Iteration 11, model log likelihood = -430425.055409...
    11:49:35.449 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.589443, biasVariance=0.480186, outlierProbability=0.145378}
    11:49:35.581 INFO  AlleleFractionInitializer - Iteration 12, model log likelihood = -430421.815571...
    11:49:35.581 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.616958, biasVariance=0.497862, outlierProbability=0.132941}
    11:49:35.710 INFO  AlleleFractionInitializer - Iteration 13, model log likelihood = -430421.147517...
    11:49:35.710 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.635304, biasVariance=0.497862, outlierProbability=0.126075}
    11:49:35.841 INFO  AlleleFractionInitializer - Iteration 14, model log likelihood = -430421.034548...
    11:49:35.841 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.642114, biasVariance=0.497862, outlierProbability=0.128001}
    11:49:35.842 WARN  AlleleFractionInitializer - The maximum-likelihood estimate for the global parameter AF_reference_bias_variance (0.497862) was near its boundary (0.500000), the model is likely not a good fit to the data!  Consider changing parameters for filtering homozygous sites.
    

    Does this mean that I should raise --genotyping-homozygous-log-ratio-threshold to include more heterozygous sites called? Or is there another problem that I should be aware of?

    Thanks,
    Peter

    Issue · Github
    by shlee

    Issue Number
    2971
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @pdu,

    Can you please share with us the ModelSegments command that is causing the WARN? In the meanwhile, I will consult our developer.

  • pdupdu Member

    @shlee Thanks for getting back to me so quickly.

    I think the WARN is caused by MultidimensionalModeller using AlleleFractionInitializer - unless you are asking for something else.

    The parameters for running the ModelSegments are the same as the ones in the google doc. The initial heterozygous site detection, MultidimensionalKernelSegmenter, and GibbsSampler commands run fine.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @pdu, while we wait for our developer, can you try running your command without the allelic input as shown below:

    gatk --java-options "-Xmx10000m" ModelSegments \
        --denoised-copy-ratios hcc1143_T_clean.denoisedCR.tsv \
        --output out \
        --output-prefix hcc1143_T_clean
    

    Thanks.

  • pdupdu Member

    @shlee, I have run ModelSegments with only the denoised copy ratio .tsv file, and the output did not throw any WARNs. I'm trying to generate plots using PlotModeledSegments, but I am not sure how to generate a .dict file of the contigs I want to plot. Can you point me to a tool that I can use to make the .dict file?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited March 2018

    @pdu, you can use the dictionary from the reference set you used to originally align the data. You can limit the plotting to the larger contigs using the --minimum-contig-length argument. For example, --minimum-contig-length 46709983 will limit GRCh38 contigs to 24 chromosomes.

    If you need to generate the dictionary from the FASTA reference, use CreateSequenceDictionary.

  • pdupdu Member

    @shlee, I plotted the modeled segments generated by ModelSegments with only the denoised copy ratios as input and got the following plot.

    Looking back, I think I did not denoise correctly. The command I used was

    gatk DenoiseReadCounts \
              -I counts.tsv \
              --standardized-copy-ratios sample.standardizedCR.tsv \
              --denoised-copy-ratios sample.denoisedCR.tsv
    

    How can I perform the normalization shown in the middle panel of section 7 in this tutorial?

  • pdupdu Member

    @shlee, I denoised using a PoN created by inputting my normal file into CreateReadCountPanelOfNormals, and now my plots look much better.

    Please let me know when you hear back from the developers about using the allelic inputs.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @pdu,

    I'm glad that tidbit that you can denoise against the normal-only was helpful to you.

    Now that you are actually denoising against a control, I think your ModelSegments (which takes in denoised data) should run without error. There are considerations for the sites at which to collect allelic counts that I am looking into currently. Do you have a specific question on how to collect allelic counts or was the error your primary concern? If the latter, please see how the denoised data performs.

  • tedtoaltedtoal Member

    I'm curious about tumor purity/contamination by normal. I would think that would be a big factor in accurately calling CNV, yet I've seen no mention of it in the GATK CNV documentation. Why is that?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @tedtoal,

    The complexity that tumor purity/heterogeneity brings is the reason why the workflow calls copy ratios and not absolute copy numbers.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited March 2018

    Hi @tushardave26,

    I've prepared two slightly different sites-only germline SNPs VCF based on gnomAD http://gnomad.broadinstitute.org/ and placed it on our FTP server:

    gnomad_grch38_snps_sitesonly.vcf.gz

    af-gnomad_grch38_snps.vcf.gz

    These are provided as-is for use with the CNV workflow, with no guarantees. The af-gnomad_grch38_snps.vcf.gz version allows subsetting based on population allele frequencies.

    I subset all SNPs-only sites from the Mutect2 resource file described in footnotes [3] and [4] of Tutorial#11136. When I say all SNPs, I mean I include non-biallelic SNP sites. The subsetting excludes any site with mixed type variants, e.g. SNP and indel, as you would not want to use such sites for allelic CNV. To minimize file size, I removed all extraneous information such that records have information in only four or five columns, columns, like so:

    chr1    143931334   .   A   G,T .   .   .
    

    and

    chr1    143931334   .   A   G,T .   .   AC=15,7;AF=0.0003961,0.0001756
    

    If you are analyzing exome capture data, you might consider subsetting these records to your padded intervals regions, to further minimize file size by additionally include the padded intervals to the CollectAllelicCounts command, e.g. -L population.vcf.gz -L padded.interval_list --interval-set-rule INTERSECTION. In the end, the workflow uses only those variants that overlap with copy ratio data.

    Finally, each resource file contains 240,555,518 records, for the same number of unique sites.

    P.S. I amended my description on 3/9/2018, to include a version of the resource containing population allele frequencies.

    Post edited by shlee on
  • sleeslee Member, Broadie, Dev ✭✭✭

    Hi @pdu,

    With those properly denoised copy ratios, my guess is that you will not run into the warning you previously encountered. Most likely your noisy copy ratios were affecting the joint copy-ratio--allele-fraction segmentation, which was in turn affecting the fit of the allele-fraction model. Can you try running again (and perhaps plotting with PlotModeledSegments) to see if your results look reasonable now?

  • pdupdu Member
    edited March 2018

    Hi @slee,

    I tried running ModelSegments again with the new denoised copy ratios, and I'm getting the same error output with AlleleFractionInitializer:

    06:26:17.961 INFO  AlleleFractionInitializer - Iteration 7, model log likelihood = -369270.577908...
    06:26:17.961 INFO  AlleleFractionInitializer - AlleleFractionGlobalParameters{meanBias=1.615908, biasVariance=0.498201, outlierProbability=0.148022}
    06:26:17.961 WARN  AlleleFractionInitializer - The maximum-likelihood estimate for the global parameter AF_reference_bias_variance (0.498201) was near its boundary (0.500000), the model is likely not a good fit to the data!  Consider changing parameters for filtering homozygous sites.
    06:26:17.961 WARN  AlleleFractionInitializer - The maximum-likelihood estimate for the global parameter AF_outlier_probability (0.148022) was near its boundary (0.150000), the model is likely not a good fit to the data!  Consider changing parameters for filtering homozygous sites.
    

    I tried plotting the resulting modeled segments, and I get the following plot:

  • sleeslee Member, Broadie, Dev ✭✭✭

    @pdu, it looks like you do not have that much allele-fraction data to work with. How large is your list of common sites? What is your average depth of coverage, and how many het sites do you end up with in your *hets.tsv file? (There is a ModelSegments parameter minimum-total-allele-count that filters out sites based on their total count, since we typically need total counts above ~20-30 to get good binomial statistics. So if your depth is low, you may be not have enough sufficiently covered sites to do a good allele-fraction analysis.)

  • pdupdu Member

    @slee,

    I am not sure where to find my list of common sites, but I have around 1850 het sites in my hets.tsv file. Based on Picard HS metrics, I have an average bait coverage of 100, and an average target coverage of ~65.

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited March 2018

    @pdu,

    OK, your coverage seems fine, but that is not very many sites. Typically we can expect around ~20k in an exome.

    The common sites list is what you provided to CollectAllelicCounts via -L. This should be any input compatible with -L (e.g., BED file, Picard/GATK interval list, VCF, etc.) that specifies sites of common variation---typically using an allele-frequency cutoff of around 10% is sufficient. It's possible that you were using a version of this list that was prepared for the tutorial and did not contain very many sites.

    @shlee is currently putting together a version of the gnomAD list she posted above that will also include the allele frequencies, which you can then filter to produce a common-sites list with your desired cutoff.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    A version of the resource containing population allele frequencies is now available and described above.

  • amjaddamjadd FinlandMember ✭✭

    Is the 39 blood PON available to download somewhere?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @amjadd,

    It's not clear what PoN you are referring to. Also, we highly recommend you create your own PoN to match to your own case sample.

  • ChatchawitChatchawit Member
    edited May 2018

    I suggest you should inform readers about the best-so-far version in the first page. The important information is the GATK version and the corresponding document (steps) to accomplish this pipeline. I think the best-so-far version is probably GATK 4.0.0.0 and your document in Google Drive. I found that "CollectFragmentCounts" is missing in GATK 4.0.4.0 but can be found in GATK 4.0.0.0. Thank you.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Chatchawit,

    Are you referring to the recent CNV tutorials using v4.0.1.1? These have been under review since May 2 and are posted at:

    Unfortunately, our reviewers have been too busy to make even a single comment as of yet! Thanks for suggesting these be posted to the top of this now outdated tutorial. I will do so.

  • BegaliBegali GermanyMember ✭✭
    edited August 2018

    Hi @shlee
    Hi @Geraldine_VdAuwera

    I have question I can call copy number variants without matched normal BAM file which I do not have
    I did first steps PreprocessIntervals &CollectReadCounts for my 33 sample cfdna
    based on info here https://software.broadinstitute.org/gatk/documentation/article?id=11682
    but when I applied this CreateReadCountPanelOfNormals
    gives me error sample not matched account for all without exception

    how can generate cnv.pon then ,,,

    with best regards

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Begali
    Hi,

    Can you post the exact command you ran and the log output with error message?

    Thanks,
    Sheila

  • BegaliBegali GermanyMember ✭✭

    @Sheila
    hi

    A)

    first:
    java -jar gatk-package-4.0.6.0-local.jar PreprocessIntervals -L annonated_gene109_17.vcf -R hg38.fa --bin-length 0 --interval-merging-rule OVERLAPPING_ONLY -O interval109_17_list
    2nd:
    java -jar gatk-package-4.0.6.0-local.jar CollectFragmentCounts -I recal109_17.bam -L targets_C.preprocessed.interval_list --interval-merging-rule OVERLAPPING_ONLY -O tumor109_17.counts.hdf5

    B) here at least three samples should apply
    gatk --java-options "-Xmx6500m" CreateReadCountPanelOfNormals \
    -I tumor109_17.counts.hdf5
    -I tumor30_17I.counts.hdf5 …….. --minimum-interval-median-percentile 5.0 -O cnvZ.pon.hdf5

    error message
    java.lang.IllegalArgumentException: Intervals for read-counts file tumor30_17I.counts.hdf5 do not match those in other read-counts files.
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
    at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.constructReadCountMatrix(CreateReadCountPanelOfNormals.java:328)
    at org.broadinstitute.hellbender.tools.copynumber.CreateReadCountPanelOfNormals.runPipeline(CreateReadCountPanelOfNormals.java:287)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:182)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:201)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    18/08/23 09:43:17 INFO ShutdownHookManager: Shutdown hook called
    18/08/23 09:43:17 INFO ShutdownHookManager: Deleting directory /tmp/pathology/spark-9712c6b4-ebdd-479b-ae45-4cf9994ff635

    how to fix do not match moreover all samples belong to cysts fluid different patients

    with best regards

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Begali
    Hi,

    Did you use the exact same interval list when running on each of the samples?

    -Sheila

  • BegaliBegali GermanyMember ✭✭

    @Sheila

    May I will clarify some issues which I would like to receive your hints about that please
    the samples for different patients and for tumor only, normal.bam files are not available at all
    Can I call somatic or Germline CNV only using tumor samples, only tumor bam files ????

    and those bam files also for different diseases and all Fastq files areprepration at wet lab using NGS

    with best regards

Sign In or Register to comment.