Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

DetermineGermlineContigPloidy

Hi,
How I get "--contig-ploidy-priors a_valid_ploidy_priors_table.tsv"? Is't a tools produce a_valid_ploidy_priors_table.tsv?

Issue · Github
by Sheila

Issue Number
3073
State
closed
Last Updated
Assignee
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @liubo
    Hi,

    Let me check with the team and get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @liubo
    Hi,

    It seems you should be able to manually generate this table relatively easily. See the tool doc for more information. You will need to run CollectFragmentCounts beforehand.

    -Sheila

  • liuboliubo Member

    I used it. CollectFragmentCounts only generated HDF5 or TSV count file not a_valid_ploidy_priors_table.tsv

  • sleeslee Member, Broadie, Dev ✭✭✭
  • sleeslee Member, Broadie, Dev ✭✭✭

    @lakhujanivijay CollectFragmentCounts has been replaced by CollectReadCounts since that post was written. Note the link to the other thread above if you are interested in the priors-table file; you don't need to collect counts to construct this file.

  • lakhujanivijaylakhujanivijay IndiaMember

    Hi @slee

    Thank you for the response.

    Vijay

  • lakhujanivijaylakhujanivijay IndiaMember

    Hi @slee

    I am still struggling on how to construct the "priors-table" file. For the tutorial , you have provided a pre-built file.

  • BegaliBegali GermanyMember ✭✭

    Hi
    @slee
    @Sheila

    I am still looking to know how I can to construct contig_ploidy_priors.tsv for my data in order to bel able to apply this https://software.broadinstitute.org/gatk/documentation/article?id=11684
    As I noticed the output of this CollectReadCount only as hdf5 and if I insert additional parameter
    --format TSV -output sample.tsv

    OUTPUT
    CONTIG START END COUNT
    not as in
    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3

    Then how I can create correct contig_ploidy_priors.tsv for my data( meregd.bam file for multiple samples)..

    with best regards

  • sleeslee Member, Broadie, Dev ✭✭✭

    @lakhujanivijay @Begali You should construct this file manually. You can use the file provided for the tutorial as a starting point. You may need to change contig names (e.g., if you are using a difference reference) or adjust the values for the priors to be more stringent (depending on your data---ideally you'd have a set of truth samples with known per-contig ploidy that you can use to spot check).

    @Begali if you are starting from a multisample BAM, you will want to be sure to that you are running the CollectReadCounts tool in a way that produces read counts for individual samples; this may require some preprocessing steps.

  • BegaliBegali GermanyMember ✭✭

    Hi @slee

    thanks for ur reply but how can I construct this file manually this is my Q.. based on what ?
    thanks in advance

  • sleeslee Member, Broadie, Dev ✭✭✭

    @Begali The probabilities in this file should reflect your prior belief for the copy-number state of each contig, given the prevalence of aneuploidies and sex genotypes in the population. For example, the table used in the tutorial indicates that we believe there is a small chance for the copy-number of chr20 to be either 1 or 3, but it is most likely 2.

    We use these priors in conjunction with the likelihood of our observed data (i.e., the total read count per contig) to determine the posterior probability of the per-contig copy number in the usual Bayesian manner. As always, high quality data (which is well explained by the likelihood model) will weaken the influence of the prior on the final result. However, if your data quality is low, you may want to impose stronger priors to regularize away the possibility of getting spurious results (e.g., unrealistic sex genotypes).

    Ideally, you would run the tool on a "training" set of samples where the truth is known, tuning the priors or other parameters to recover the correct result if necessary. Once this tuning procedure is complete, you can proceed to use the same priors and parameters on subsequent samples. However, if PARs and other problematic regions are appropriately masked (as mentioned in the tutorial), usually the results of this tool are reasonable without any tuning required.

Sign In or Register to comment.