On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

Overview of GetBayesianHetCoverage for heterozygous SNP calling

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭

The GetBayesianHetCoverage tool has two main run modes: (1) tumor-only, and (2) matched normal-tumor. It has an I/O similar to the GetHetCoverage tool and may be seamlessly interchanged with GetHetCoverage in the workflow. The output file, however, contains extra columns (see below).

Tumor-only mode:

Command line:

java -jar gatk-protected.jar GetBayesianHetCoverage
    --reference <reference genome FASTA file>
    --snpIntervals <common SNPs IntervalList file>
    --tumor <tumor BAM file>
    --tumorHets <output file path for tumor Het SNP coverage>

Useful optional arguments:

    --hetCallingStringency <Het SNP calling stringency>
    --minimumAbnormalFraction <estimate of the minimum fraction of cells with CNV events>
    --maximumAbnormalFraction <estimate of the maximum fraction of cells with CNV events>
    --maximumCopyNumber <estimate of the maximum copy number for cells with CNV events>
    --quadratureOrder <grid size for numerical integrations>

Output:

One tab-separated file containing Het SNPs called from the tumor BAM file, including Ref/Alt alleles, their counts, and the pileup size at each genomic position.

Matched normal-tumor mode:

Command line:

java -jar gatk-protected.jar GetBayesianHetCoverage
    --reference <reference genome FASTA file>
    --snpIntervals <common SNPs IntervalList file>
    --tumor <tumor BAM file>
    --tumorHets <output file path for tumor Het SNP events>
    --normal <normal BAM file>
    --normalHets <output file path for normal Het SNP events>

Useful optional arguments:

    --hetCallingStringency <Het SNP calling stringency>

Output:

Two tab-separated files containing Het SNPs called from the normal and tumor BAM files, respectively. The normal output file contains Ref/Alt alleles, their counts, the pileup size, and log odds of being a Het SNP at each genomic position. The tumor output file contains statistics collected from the tumor BAM file on the Het SNPs called from the normal BAM file: Ref/Alt alleles, their counts in tumor BAM, and the pileup size at each genomic position.

Note: the tool may also be run in the normal-only mode but it is not intended to be used with the ACNV workflow. It may be used as a standalone Het SNP caller with a somewhat similar functionality to MuTect and UnifiedGenotyper. Note that it only calls Het SNPs from a given list of possible candidates (see below).

Normal-only mode:

Command line:

java -jar gatk-protected.jar GetBayesianHetCoverage
    --reference <reference genome FASTA file>
    --snpIntervals <common SNPs IntervalList file>
    --normal <normal BAM file>
    --normalHets <output file path for normal Het SNP events>

Useful optional arguments:

    --hetCallingStringency <Het SNP calling stringency>

Output:

One tab-separated file containing Het SNPs called from the normal BAM file, including Ref/Alt alleles, their counts, the pileup size, and log odds of being a Het SNP at each genomic position.

Other use optional arguments (can be used in all modes):

    --minimumMappingQuality <minimum PHRED-scaled mapping quality>
    --minimumBaseQuality <minimum PHRED-scaled base quality>

Comments

  • LizzLizz ChinaMember

    Hi, I confued about the paramater:
    --snpIntervals

    how to get this , from my case/normal samples's SNP?

  • sleeslee Member, Broadie, Dev ✭✭

    Hi @Lizz,

    This argument should specify the filename for an IntervalList of common SNP sites. The GetBayesianHetCoverage tool will test only these common sites for heterozygosity in your normal sample; then, it will return the ref/alt counts in your case sample at those sites passing the test. See the related post http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage#latest for an example of how one might create such a list of common sites.

  • LizzLizz ChinaMember

    @slee,thank u!
    and i have another two question:
    if i use the WES data, should the common SNP sites be in the exome area?

  • LizzLizz ChinaMember

    @Lizz said:
    thank u! @slee
    and i have another two question:

    1. if i use the WES data, should the common SNP sites be in the exome area?
    2. I have 56 tumor/normal samples. I should use the normal samples to prepare the list of common SNPs , or I could use the 1000genome WGS vcf to prepare it ?

  • sleeslee Member, Broadie, Dev ✭✭

    Hi @Lizz,

    @Lizz said:
    1. if i use the WES data, should the common SNP sites be in the exome area?

    The sites in the list of common SNPs does not have to be limited to the exome. The GetHetCoverage/GetBayesianHetCoverage tools will test only those common sites that are sufficiently covered in the BAM for heterozygosity. Sites without coverage are essentially ignored and do not add significantly to the runtime.

    @Lizz said:
    2. I have 56 tumor/normal samples. I should use the normal samples to prepare the list of common SNPs , or I could use the 1000genome WGS vcf to prepare it ?

    If you already have het calls from another tool for your normal samples, you don't really need to be running GetHetCoverage/GetBayesianHetCoverage in the first place! The idea is that these tools provide a quick and dirty way to call germline hets (by looking only at known, common SNP sites, and not trying to call novel SNPs). They also collect the corresponding ref/alt counts in the tumor.

    So the larger the list of common SNPs (i.e., the lower the allele-frequency threshold), the more hets you will recover in your samples and the better your ACNV results will be. However, this will come at the expense of runtime---you may be needlessly examining a lot of covered sites that are only variant in a small fraction of the population.

    In practice, we find that using a common SNP list constructed from the 1000G reference panel with an allele-frequency threshold of 10% (as described in that post) gives reasonable results, typically recovering around 20k--25k hets in an exome. The number recovered also depends on the parameters used for GetHetCoverage/GetBayesianHetCoverage---more strict tests for heterozygosity will return fewer hets.

  • diogopelldiogopell sao pauloMember

    Hi,

    I was trying to run the matched tumor and normal mode, but I'm getting an error with the parser.

    java -jar gatk-4.beta.5/gatk-package-4.beta.5-local.jar GetBayesianHetCoverage \
    --reference ucsc.hg19.fasta \
    --snpIntervals dbsnp_138.hg19.interval_list \
    --tumor 01T.recal.bam \
    --tumorHets 01T.het \
    --normal 01N.recal.bam \
    --normalHets 01N.het;

    It prints the usage
    "USAGE: GetBayesianHetCoverage [arguments]
    ...
    "
    And then this error:


    A USER ERROR has occurred: Invalid argument '01T.het'.


    org.broadinstitute.barclay.argparser.CommandLineException: Invalid argument '04T.07.het'.
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.setPositionalArgument(CommandLineArgumentParser.java:591)
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.parseArguments(CommandLineArgumentParser.java:423)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:217)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
    at org.broadinstitute.hellbender.Main.main(Main.java:233)

    I have tried different tumorHets names, such as 'a' and i would get ": Invalid argument 'a'."
    I also tried using gatk-4.beta.6

    My java -version is
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

    Can anyone help me with this?
    Thanks,
    Diogo

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator admin

    Hi @diogopell,

    Please upgrade to the GATK4 somatic CNV workflow. The workflow is still in beta but has been developed further than the gatk-4.beta counterparts. Tutorials for the new workflows are at https://gatkforums.broadinstitute.org/dsde/discussion/11682 and https://gatkforums.broadinstitute.org/dsde/discussion/11683.

Sign In or Register to comment.