Description and examples of the steps in the ACNV case workflow
Once you have run GATK CNV, you can run ACNV for revised segments based on both the target-coverage profile and the ref/alt counts at heterozygous SNPs. ACNV will report estimates for the posterior probabilities for copy ratio and minor-allele fraction in each segment.
The ACNV case workflow (description and examples)
- Java 1.8
- A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
- Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
- Samples must be paired. You will need both a case sample (typically, a tumor) and a control sample (typically, a blood normal). We are working on alleviating this requirement.
- A list of common heterozygous SNP sites. Currently, this needs to be in the Picard interval-list format. See http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage
- A completed run of GATK CNV for the case sample.
Overview of steps
- Identify heterozygous SNPs in the normal and aggregate read counts at these sites in the tumor.
- Segment the case sample (based on both the read counts from step 1 and input from GATK CNV) and estimate copy ratio and minor-allele fraction in each segment.
- Call copy-neutral loss-of-heterozygosity and balanced segments. This step will also create files that can be used as input for ABSOLUTE (Broad-internal versions only) and TITAN.
Step 1. Het Pulldown
** These instructions describe one method for Het Pulldown for matched samples. For more options, including tumor-only, please see: http://gatkforums.broadinstitute.org/gatk/discussion/7719/overview-of-getbayesianhetcoverage-for-heterozygous-snp-calling **
- control_bam -- BAM file for control sample (normal).
- case_bam -- BAM file for case sample (tumor).
- reference_sequence -- FASTA file for b37 reference.
- snp_file -- Picard interval list of common SNP sites at which to test for heterozygosity in the control sample .
- normal_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc., where M is the number of hets called in the control sample.
- tumor_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc. for sites in the case sample that were called as het in the control sample, where M is the number of hets called in the control sample.
Format for both output files:
CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE READ_DEPTH 1 809876 5 16 A G 21 1 881627 23 12 G A 35 1 882033 9 10 G A 19 1 900505 26 24 G C 50 ....snip....
java -jar <path_to_gatk_protected_jar> GetBayesianHetCoverage --reference <reference_sequence> --snpIntervals <snp_file> --tumor <case_bam> --tumorHets <tumor_het_pulldown> --normal <control_bam> --normalHets <normal_het_pulldown> --hetCallingStringency 30
Step 2. Allelic CNV
- tumor_het_pulldown -- Generated in step 1.
- coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow.
- called_segments -- Called-segments TSV file obtained in the GATK CNV case workflow.
- output_prefix -- Path and file prefix for creating the output files. For example, /home/lichtens/my_acnv_output/sample1
- acnv_segments -- TSV file with name ending with
-sim-final.segcontaining posterior summary statistics for log_2 copy ratio and minor-allele fraction in each segment. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.seg
- acnv_cr_parameters -- TSV file with name ending with
-sim-final.cr.paramcontaining posterior summary statistics for global parameters of the copy-ratio model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.cr.param
- acnv_af_parameters -- TSV file with name ending with
-sim-final.af.paramcontaining posterior summary statistics for global parameters of the allele-fraction model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.af.param
Other files containing intermediate results of the calculation are also generated.
java -Xmx8g -jar <path_to_gatk_protected_jar> AllelicCNV --tumorHets <tumor_het_pulldown> --tangentNormalized <coverage_profile> --segments <called_segments> --outputPrefix <output_prefix>
Step 3. Call CNLoH and Balanced Segments
** WARNING: This tool is experimental and exists primarily for internal Broad use. **
- tumor_het_pulldown -- Generated in step 1.
- acnv_segments -- Generated in step 2 (*-sim-final.seg).
- coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow
- output_dir -- Directory for creating the output files. For example, /home/lichtens/my_acnv_cnlohcalls_output/
- GATK-CNV-formatted seg file -- TSV file ending with
-sim-final.cnv.seg. This file is formatted identically as the output of GATK CNV. Note that this implies that the allelic fraction values are not captured in this file.
- AllelicCapSeg-formatted seg file -- TSV file ending with
-sim-final.acs.seg. This file is formatted identically as the output of Broad CGA AllelicCapSeg. Note that this file can be used as input to Broad-internal versions of ABSOLUTE.
- TITAN-compatible het file --TSV file ending with
-sim-final.titan.het.tsv. This file can be used as the input to TITAN for the het read counts.
- TITAN-compatible copy-ratio file -- TSV file ending with
-sim-final.titan.tn.tsv. This file can be used as the input to TITAN for the per-target copy-ratio estimates.
java -Xmx8g -jar <path_to_gatk_protected_jar> CallCNLoHAndSplits --tumorHets <tumor_het_pulldown> --segments <acnv_segments> --tangentNormalized <coverage_profile> --outputDir <output_dir> --rhoThreshold 0.2 --numIterations 10 --sparkMaster local[*]