Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Broad website contact form: Two things want to confirm with you before running Genome STRiP

zzqzzq ChinaMember

Hi Bob,

I have pasted my questions here. It will be helpful for users who have the same queries.

Thanks for your information. From this site https://gatkforums.broadinstitute.org/gatk/discussion/1492/genome-mask-files, @Geraldine_VdAuwera introduced that a base is assigned a 0 if an N base sequence centered on this read is unique within the reference genome after running ComputeGenomeMask. Hope you can help us to have a final check. Thank you very much.

Best wishes,
Zhuqing

Hi,

It is probably better to submit questions on the GATK forum.

The masks all use 1 for a position to keep, 0 for a position to drop (like bitwise AND).

For the CN2 mask, you want to keep positions that are more likely to be non-variable in most individuals (so you set the sex chromosomes to zero, along with known repeats, CNVs, etc.).

For the alignability mask, reliably alignable positions should be marked as 1 after running ComputeGenomeMask.
If you look at the human masks, I believe they should follow this same pattern.

-Bob

Question/Comment: For non-human genomes, we should prepare the alignability mask and CN2 mask files before running Genome STRiP. For alignability mask I will use ComputeGenomeMask, for CN2 mask I will exclude the sex chromosomes, unplaced contigs, and repeat annotations from RepeatMask (all these regions should be masked with a 0). Am I right?

Another thing I want to confirm with you is that for alignability mask fasta file, the positions are masked with a 0 if they are reliably alignable and 1 if they are not. However, for CN2 mask fasta file, the positions are masked with a 0 if they are likely to be copy number polumorphic and 1 if they are unlikely. Am I right?

Thank you.

Tagged:

Best Answer

  • bhandsakerbhandsaker ✭✭✭✭
    Accepted Answer

    My email response to Zhuqing below was incorrect. The documentation from 2012 is still correct with respect to how the bases are marked.

    In the various genome masks, bases marked with a "1" value are masked out (not used), bases with a "0" values are included. Thus, for the alignability masks (svmasks) the uniquely alignable bases are indicated with "0" and the non-unique bases with "1". For the other masks, for example the gcmask (formerly called the cn2 mask), bases in the the more well-behaved parts of the genome are marked as "0", other bases as "1", etc.

    Sorry about the confusion.

Answers

Sign In or Register to comment.