Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Description and examples of the steps in the ACNV case workflow
Once you have run GATK CNV, you can run ACNV for revised segments based on both the target-coverage profile and the ref/alt counts at heterozygous SNPs. ACNV will report estimates for the posterior probabilities for copy ratio and minor-allele fraction in each segment.
The ACNV case workflow (description and examples)
- Java 1.8
- A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
- Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
- Samples must be paired. You will need both a case sample (typically, a tumor) and a control sample (typically, a blood normal). We are working on alleviating this requirement.
- A list of common heterozygous SNP sites. Currently, this needs to be in the Picard interval-list format. See http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage
- A completed run of GATK CNV for the case sample.
Overview of steps
- Identify heterozygous SNPs in the normal and aggregate read counts at these sites in the tumor.
- Segment the case sample (based on both the read counts from step 1 and input from GATK CNV) and estimate copy ratio and minor-allele fraction in each segment.
- Call copy-neutral loss-of-heterozygosity and balanced segments. This step will also create files that can be used as input for ABSOLUTE (Broad-internal versions only) and TITAN.
Step 1. Het Pulldown
** These instructions describe one method for Het Pulldown for matched samples. For more options, including tumor-only, please see: http://gatkforums.broadinstitute.org/gatk/discussion/7719/overview-of-getbayesianhetcoverage-for-heterozygous-snp-calling **
- control_bam -- BAM file for control sample (normal).
- case_bam -- BAM file for case sample (tumor).
- reference_sequence -- FASTA file for b37 reference.
- snp_file -- Picard interval list of common SNP sites at which to test for heterozygosity in the control sample .
- normal_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc., where M is the number of hets called in the control sample.
- tumor_het_pulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc. for sites in the case sample that were called as het in the control sample, where M is the number of hets called in the control sample.
Format for both output files:
CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE READ_DEPTH 1 809876 5 16 A G 21 1 881627 23 12 G A 35 1 882033 9 10 G A 19 1 900505 26 24 G C 50 ....snip....
java -jar <path_to_gatk_protected_jar> GetBayesianHetCoverage --reference <reference_sequence> --snpIntervals <snp_file> --tumor <case_bam> --tumorHets <tumor_het_pulldown> --normal <control_bam> --normalHets <normal_het_pulldown> --hetCallingStringency 30
Step 2. Allelic CNV
- tumor_het_pulldown -- Generated in step 1.
- coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow.
- called_segments -- Called-segments TSV file obtained in the GATK CNV case workflow.
- output_prefix -- Path and file prefix for creating the output files. For example, /home/lichtens/my_acnv_output/sample1
- acnv_segments -- TSV file with name ending with
-sim-final.segcontaining posterior summary statistics for log_2 copy ratio and minor-allele fraction in each segment. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.seg
- acnv_cr_parameters -- TSV file with name ending with
-sim-final.cr.paramcontaining posterior summary statistics for global parameters of the copy-ratio model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.cr.param
- acnv_af_parameters -- TSV file with name ending with
-sim-final.af.paramcontaining posterior summary statistics for global parameters of the allele-fraction model. Using the above output_prefix, /home/lichtens/my_acnv_output/sample1-sim-final.af.param
Other files containing intermediate results of the calculation are also generated.
java -Xmx8g -jar <path_to_gatk_protected_jar> AllelicCNV --tumorHets <tumor_het_pulldown> --tangentNormalized <coverage_profile> --segments <called_segments> --outputPrefix <output_prefix>
Step 3. Call CNLoH and Balanced Segments
** WARNING: This tool is experimental and exists primarily for internal Broad use. **
- tumor_het_pulldown -- Generated in step 1.
- acnv_segments -- Generated in step 2 (*-sim-final.seg).
- coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow
- output_dir -- Directory for creating the output files. For example, /home/lichtens/my_acnv_cnlohcalls_output/
- GATK-CNV-formatted seg file -- TSV file ending with
-sim-final.cnv.seg. This file is formatted identically as the output of GATK CNV. Note that this implies that the allelic fraction values are not captured in this file.
- AllelicCapSeg-formatted seg file -- TSV file ending with
-sim-final.acs.seg. This file is formatted identically as the output of Broad CGA AllelicCapSeg. Note that this file can be used as input to Broad-internal versions of ABSOLUTE.
- TITAN-compatible het file --TSV file ending with
-sim-final.titan.het.tsv. This file can be used as the input to TITAN for the het read counts.
- TITAN-compatible copy-ratio file -- TSV file ending with
-sim-final.titan.tn.tsv. This file can be used as the input to TITAN for the per-target copy-ratio estimates.
java -Xmx8g -jar <path_to_gatk_protected_jar> CallCNLoHAndSplits --tumorHets <tumor_het_pulldown> --segments <acnv_segments> --tangentNormalized <coverage_profile> --outputDir <output_dir> --rhoThreshold 0.2 --numIterations 10 --sparkMaster local[*]