Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Broad website contact form: Two things want to confirm with you before running Genome STRiP
I have pasted my questions here. It will be helpful for users who have the same queries.
Thanks for your information. From this site https://gatkforums.broadinstitute.org/gatk/discussion/1492/genome-mask-files, @Geraldine_VdAuwera introduced that a base is assigned a 0 if an N base sequence centered on this read is unique within the reference genome after running ComputeGenomeMask. Hope you can help us to have a final check. Thank you very much.
It is probably better to submit questions on the GATK forum.
The masks all use 1 for a position to keep, 0 for a position to drop (like bitwise AND).
For the CN2 mask, you want to keep positions that are more likely to be non-variable in most individuals (so you set the sex chromosomes to zero, along with known repeats, CNVs, etc.).
For the alignability mask, reliably alignable positions should be marked as 1 after running ComputeGenomeMask.
If you look at the human masks, I believe they should follow this same pattern.
Question/Comment: For non-human genomes, we should prepare the alignability mask and CN2 mask files before running Genome STRiP. For alignability mask I will use ComputeGenomeMask, for CN2 mask I will exclude the sex chromosomes, unplaced contigs, and repeat annotations from RepeatMask (all these regions should be masked with a 0). Am I right?
Another thing I want to confirm with you is that for alignability mask fasta file, the positions are masked with a 0 if they are reliably alignable and 1 if they are not. However, for CN2 mask fasta file, the positions are masked with a 0 if they are likely to be copy number polumorphic and 1 if they are unlikely. Am I right?