We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Broad website contact form: Two things want to confirm with you before running Genome STRiP

zzqzzq ChinaMember

Hi Bob,

I have pasted my questions here. It will be helpful for users who have the same queries.

Thanks for your information. From this site https://gatkforums.broadinstitute.org/gatk/discussion/1492/genome-mask-files, @Geraldine_VdAuwera introduced that a base is assigned a 0 if an N base sequence centered on this read is unique within the reference genome after running ComputeGenomeMask. Hope you can help us to have a final check. Thank you very much.

Best wishes,


It is probably better to submit questions on the GATK forum.

The masks all use 1 for a position to keep, 0 for a position to drop (like bitwise AND).

For the CN2 mask, you want to keep positions that are more likely to be non-variable in most individuals (so you set the sex chromosomes to zero, along with known repeats, CNVs, etc.).

For the alignability mask, reliably alignable positions should be marked as 1 after running ComputeGenomeMask.
If you look at the human masks, I believe they should follow this same pattern.


Question/Comment: For non-human genomes, we should prepare the alignability mask and CN2 mask files before running Genome STRiP. For alignability mask I will use ComputeGenomeMask, for CN2 mask I will exclude the sex chromosomes, unplaced contigs, and repeat annotations from RepeatMask (all these regions should be masked with a 0). Am I right?

Another thing I want to confirm with you is that for alignability mask fasta file, the positions are masked with a 0 if they are reliably alignable and 1 if they are not. However, for CN2 mask fasta file, the positions are masked with a 0 if they are likely to be copy number polumorphic and 1 if they are unlikely. Am I right?

Thank you.


Best Answer

  • bhandsakerbhandsaker ✭✭✭✭
    Accepted Answer

    My email response to Zhuqing below was incorrect. The documentation from 2012 is still correct with respect to how the bases are marked.

    In the various genome masks, bases marked with a "1" value are masked out (not used), bases with a "0" values are included. Thus, for the alignability masks (svmasks) the uniquely alignable bases are indicated with "0" and the non-unique bases with "1". For the other masks, for example the gcmask (formerly called the cn2 mask), bases in the the more well-behaved parts of the genome are marked as "0", other bases as "1", etc.

    Sorry about the confusion.


Sign In or Register to comment.