Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Genomestrip CNV QC plots
I'm trying to figure out how to best to filter my CNVs (from the CNV genotyping pipeline, then with the R script).
I've attached graphs of the QC metrics for each event across the Drosophila genome.
For GS-ClusterSeparation, is a high value good or bad?
Due to the breeding design, I'm not worried about the lack of homozygous variants (218 half-sibs, 1 reference-line mother).
There's a weird peak for CN-Qual at chr3L:9Mb, I've attached a picture from IGV showing the genotype calls in the region. (Turquoise=wt, Green=het, Red=hom mut.) It almost looks like one event, in five individuals - or something really weird - or perhaps more likely a poorly-sequenced region. I shall try to post the bam file images soon.
My plan is:
1. Remove all the deletions (as these should be detected by GS-deletion pipeline)
2. Remove events where alt=0, or all individuals have the same genotype (Not sure there are so many; guess oversensitive calling).
3. Keep all the common CNVs.
4. Filter rare CNVs stringently by QC metrics.
5. Combine all passing CNVs, and deletions from the other pipeline, and feed them into GS-genotyper..
What do you reckon? Apologies for being a bit vague.