Description and examples of the steps in the ACNV case workflow

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
edited September 2016 in GATK 4 Beta

Once you have run GATK CNV, you can run ACNV for revised segments based on both the target-coverage profile and the ref/alt counts at heterozygous SNPs. ACNV will report estimates for the posterior probabilities for copy ratio and minor-allele fraction in each segment.

The ACNV case workflow (description and examples)

Requirements

  1. Java 1.8
  2. A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
  3. Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
  4. Samples must be paired. You will need both a case sample (typically, a tumor) and a control sample (typically, a blood normal). We are working on alleviating this requirement.
  5. A list of common heterozygous SNP sites. Currently, this needs to be in the Picard interval-list format. See http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage
  6. A completed run of GATK CNV for the case sample.
Overview of steps
  1. Identify heterozygous SNPs in the normal and aggregate read counts at these sites in the tumor.
  2. Segment the case sample (based on both the read counts from step 1 and input from GATK CNV) and estimate copy ratio and minor-allele fraction in each segment.
  3. Call copy-neutral loss-of-heterozygosity and balanced segments. This step will also create files that can be used as input for ABSOLUTE (Broad-internal versions only) and TITAN.
Step 1. Het Pulldown

** These instructions describe one method for Het Pulldown for matched samples. For more options, including tumor-only, please see: http://gatkforums.broadinstitute.org/gatk/discussion/7719/overview-of-getbayesianhetcoverage-for-heterozygous-snp-calling **

Inputs
  • control_bam -- BAM file for control sample (normal).
  • case_bam -- BAM file for case sample (tumor).
  • reference_sequence -- FASTA file for b37 reference.
  • snp_file -- Picard interval list of common SNP sites at which to test for heterozygosity in the control sample .
Outputs
  • normal_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc., where M is the number of hets called in the control sample.
  • tumor_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc. for sites in the case sample that were called as het in the control sample, where M is the number of hets called in the control sample.

Format for both output files:

CONTIG  POSITION        REF_COUNT       ALT_COUNT       REF_NUCLEOTIDE  ALT_NUCLEOTIDE  READ_DEPTH
1       809876  5       16      A       G       21
1       881627  23      12      G       A       35
1       882033  9       10      G       A       19
1       900505  26      24      G       C       50
....snip....
Invocation
java -jar <path_to_gatk_protected_jar> GetBayesianHetCoverage --reference <reference_sequence>
    --snpIntervals <snp_file> --tumor <case_bam> --tumorHets <tumor_het_pulldown> --normal <control_bam>
    --normalHets <normal_het_pulldown> --hetCallingStringency 30
Step 2. Allelic CNV
Inputs
  • tumor_het_pulldown -- Generated in step 1.
  • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow.
  • called_segments -- Called-segments TSV file obtained in the GATK CNV case workflow.
  • output_prefix -- Path and file prefix for creating the output files. For example, /home/lichtens/my_acnv_output/sample1
Outputs
  • acnv_segments -- TSV file with name ending with -sim-final.seg containing posterior summary statistics for log_2 copy ratio and minor-allele fraction in each segment. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.seg
  • acnv_cr_parameters -- TSV file with name ending with -sim-final.cr.param containing posterior summary statistics for global parameters of the copy-ratio model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.cr.param
  • acnv_af_parameters -- TSV file with name ending with -sim-final.af.param containing posterior summary statistics for global parameters of the allele-fraction model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.af.param

Other files containing intermediate results of the calculation are also generated.

Invocation
 java -Xmx8g -jar <path_to_gatk_protected_jar> AllelicCNV  --tumorHets <tumor_het_pulldown>
    --tangentNormalized <coverage_profile> --segments <called_segments> --outputPrefix <output_prefix>
Step 3. Call CNLoH and Balanced Segments

** WARNING: This tool is experimental and exists primarily for internal Broad use. **

Inputs
  • tumor_het_pulldown -- Generated in step 1.
  • acnv_segments -- Generated in step 2 (*-sim-final.seg).
  • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow
  • output_dir -- Directory for creating the output files. For example, /home/lichtens/my_acnv_cnlohcalls_output/
Outputs
  • GATK-CNV-formatted seg file -- TSV file ending with -sim-final.cnv.seg. This file is formatted identically as the output of GATK CNV. Note that this implies that the allelic fraction values are not captured in this file.
  • AllelicCapSeg-formatted seg file -- TSV file ending with -sim-final.acs.seg. This file is formatted identically as the output of Broad CGA AllelicCapSeg. Note that this file can be used as input to Broad-internal versions of ABSOLUTE.
  • TITAN-compatible het file --TSV file ending with -sim-final.titan.het.tsv. This file can be used as the input to TITAN for the het read counts.
  • TITAN-compatible copy-ratio file -- TSV file ending with -sim-final.titan.tn.tsv. This file can be used as the input to TITAN for the per-target copy-ratio estimates.
Invocation
 java -Xmx8g -jar <path_to_gatk_protected_jar> CallCNLoHAndSplits  --tumorHets <tumor_het_pulldown>
    --segments <acnv_segments> --tangentNormalized <coverage_profile> --outputDir <output_dir>
    --rhoThreshold 0.2 --numIterations 10  --sparkMaster local[*]  
Post edited by slee on

Comments

  • LizzLizz ChinaMember

    Hi, if i have the paired (tumor/normal) sample, how to use the normalHets? In the step2, could I use the normalHets file for the arguments --allelicPanelOfNormals,-aPON:File (Input file for allelic-bias panel of normals. Default value: null.)?

  • sleeslee Member, Broadie, Dev ✭✭

    Hi @Lizz,

    If you use GetHetCoverage or GetBayesianHetCoverage to generate het pulldowns for a large number of normals, you can use these as input to the tool CreateAllelicPanelOfNormals to generate an allelic PoN to use with ACNV. However, the allelic PoN is still under development and the code for this tool was only recently merged into master; thus, if you are using one of the tagged releases, it will not be available.

    For the time being, I'd recommend running without the allelic PoN. In this case, the normal het pulldown is not typically used after step 1. However, if you are interested in treating the normal as a tumor case sample, you could pass it as the argument of tumorHet in step 2.

  • LizzLizz ChinaMember

    thank u!@slee
    I have already accomplished the ACNV pipline, and I get the allelicCNV-sim-final.cnv.seg in the step3, not the step2.
    and there are still some other diffrences from this description:
    1. In step 2, the outputs is : *-MAF.seg, *-no-small.seg, *-sim-final.seg, *-sim-initial.seg, but not the -sim-final.cnv.seg file.
    2. In step 3, the outputs is : *-sim-final.acs.seg, *-sim-final.cnb_called.seg, *-sim-final.cnv.seg, *-sim-final.titan.het.tsv, *-sim-final.titan.tn.tsv, not including the file -sim-final.seg.
    3. in step 2 and step3, there are have the same file, just for example: /home/lichtens/my_acnv_output/sample1-sim-final.seg , is it resonable?

    what's more, I still have 2 quenstions about the file *-sim-final.acs.seg as input to ABSOLUTE:
    1. when I run ABSOLTE the cmdline as follows:

  • LizzLizz ChinaMember

    sorry about the horrible network!
    what's more, I still have 2 quenstions about the file *-sim-final.acs.seg as input to ABSOLUTE:
    1. when I run ABSOLTE the cmdline as follows:(R-3.1.3

    library(numDeriv)
    library(ABSOLUTE)
    RunAbsolute("FQ1.allelicCNV-sim-final.acs.seg", 0, 0.02, 0.95, 10, "PRIMARY_DISEASE", "Illumina_WES", "FQ1", "./", 1500, 0, 0, "allelic", maf.fn=NULL, min.mut.af=NULL, output.fn.base="ofb", verbose=TRUE)

    and the error as :
    Error: bad restore file magic number (file may be corrupted) -- no data loaded
    In addition: Warning message:
    file ‘FQ1.allelicCNV-sim-final.acs.seg’ has magic number 'Chrom'
    Use of save versions prior to 2 is deprecated

    the file format is not right? do you have the example file (*-sim-final.acs.seg), and how do you run the ABSOLUTE command? and the format of my -sim-final.acs.seg file is :
    Chromosome Start.bp End.bp n_probes length n_hets f tau sigma.tau mu.minor sigma.minor mu.major sigma.major SegLabelCNLOH
    chr1 924630 1298560 11 373930 44 0.5 3.895360886524925 0.017691433016881752 0.961756993262878 0 0.961756993262878 0 2
    chr1 1298561 1645376 13 346815 28 0.5 3.8816952962393625 0.030571261954410678 0.956686874399459 0 0.956686874399459 0 2
    chr1 1645377 2355508 9 710131 17 0.5 3.9980144929179966 0.022250610170735652 0.9992837019032069 0 0.9992837019032069 0 2

  • sleeslee Member, Broadie, Dev ✭✭
    edited September 2016

    Hi @lizz,

    Glad to hear you were able to run ACNV, and thanks for pointing out the differences in this description (which has fallen a little out of date, as we are still actively developing CNV and ACNV---I will be sure to update it soon, though!) It sounds like the run is completing correctly.

    I am not quite sure why the input to ABSOLUTE is not working correctly. Which version of ABSOLUTE are you running? I will double check with some members in Broad CGA (who have been successfully using ACNV as input to ABSOLUTE) to see how they are running things and will get back to you!

    Post edited by slee on
  • LizzLizz ChinaMember

    Hi, @slee ,thank you!
    I forgot an important thing that the version of reference genome is hg38, not hg19. And I couldn't plot the pictures(.png), but the INFO is SUCCESS, why?
    the environment that I run ABSOLUTE:
    1. ABSOLUTE-1.0.6 (http://www.broadinstitute.org/cancer/cga/absolute_download)
    2. R-3.1.3 with the numDeriv / ggplot2 / Rcpp / foreach / DNAcopy / gslibs package , not recommended version of R-2.12.0
    3. the input file is descrip above

  • sleeslee Member, Broadie, Dev ✭✭
    edited September 2016

    Hi @lizz,

    Thanks for the info---I'll look into things. It's possible that the issue is due to the versions of R you are using.

    Also, it may be the case that using hg38 is causing the error in plotting, as plotting only supports hg19 at the moment. That being said, we will try to change the code so that an exception is thrown in this case, instead of a SUCCESS message being returned!

    Post edited by slee on
  • sleeslee Member, Broadie, Dev ✭✭

    Hi @lizz,

    Actually, I'm not sure if the publicly available version of ABSOLUTE-1.0.6 is able to take in either AllelicCapSeg or ACNV output---it looks like it may only take in HAPSEG output in allelic mode.

    The conversion of ACNV output to an ABSOLUTE-compatible AllelicCapSeg file was enabled for Broad CGA users, who are using a version of ABSOLUTE that is (unfortunately) not available to the public at the moment. We will update the post to reflect this---apologies for the confusion!

  • amjadamjad HelsinkiMember

    how can I disable spark in AllelicCNV?

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited January 2017

    @Lizz,

    GenePattern's documentation of ABSOLUTE v1.0.6 recommends R2.15 with the following packages:

    • numDeriv_2012.9-1
    • getopt_1.17
    • optparse_0.9.5

    Please be exact in the version of R and packages you use.

  • amjadamjad HelsinkiMember

    @amjad said:
    how can I disable spark in AllelicCNV?

    The problem was that I was running too many samples in parallel and spark apparently ran out of ports. It works now after reducing the number of running jobs.

  • amjadamjad HelsinkiMember

    I am getting an error in AllelicCNV only in few samples:

    java.lang.IllegalArgumentException: Initial point in slice sampler is not within specified range.
    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:609)
    at org.broadinstitute.hellbender.utils.mcmc.SliceSampler.sample(SliceSampler.java:71)
    at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioSamplers$VarianceSampler.sample(CopyRatioSamplers.java:54)
    at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioSamplers$VarianceSampler.sample(CopyRatioSamplers.java:29)
    at org.broadinstitute.hellbender.utils.mcmc.ParameterizedModel.doGibbsUpdate(ParameterizedModel.java:134)
    at org.broadinstitute.hellbender.utils.mcmc.ParameterizedModel.update(ParameterizedModel.java:124)
    at org.broadinstitute.hellbender.utils.mcmc.GibbsSampler.runMCMC(GibbsSampler.java:76)
    at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioModeller.fitMCMC(CopyRatioModeller.java:96)
    at org.broadinstitute.hellbender.tools.exome.ACNVModeller.fitModel(ACNVModeller.java:145)
    at org.broadinstitute.hellbender.tools.exome.ACNVModeller.(ACNVModeller.java:106)
    at org.broadinstitute.hellbender.tools.exome.AllelicCNV.runPipeline(AllelicCNV.java:280)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:108)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:166)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185)
    at org.broadinstitute.hellbender.Main.instanceMain(Main.java:76)
    at org.broadinstitute.hellbender.Main.main(Main.java:92)

  • sleeslee Member, Broadie, Dev ✭✭

    Hi @amjad,

    That appears to be the result of an edge case in the slice sampling---should be a quick fix. Can I ask which version of gatk-protected (i.e., commit hash or release number) you are using?

  • FPBarthelFPBarthel HoustonMember
    edited October 17

    Hi @LeeTL1220 I successfully used your recent GATK4 workflow (part of 4.0.10.1 release) to generate acs.seg file for some cancer genome data. However using ABSOLUTE 1.0.6, I am unable to import these files using RunAbsolute(), which requires and R object as input. Am currently trying to figure out how to convert the acs.seg file into a siutable R object format. Do you have any pointers on how to do so?

  • LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

    @FPBarthel Unfortunately, ABSOLUTE is maintained by a different group. I have always had them set it up for me to run and I don't have access to the source code/configuration. I currently run it through FireCloud (when I run it).

  • FPBarthelFPBarthel HoustonMember

    Thanks @LeeTL1220! After a bit more digging on Google, Github and message boards I came across a somewhat more recent version of ABSOLUTE here which does seem to support ACS input. It's been giving a lot of errors but I've been able to address some of these with small code fixes and that seems to be fruitful. Is there anyone I can reach out to from the ABSOLUTE group you mention who may be able to give some additional insights?

Sign In or Register to comment.