This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Possible to using CNNScoreVariants with PacBio reads?
We have developed a SNV calling method for long reads (https://github.com/pjedge/longshot). The false positive variants that result from our method tend to occur in certain sequence contexts and often have various signals that could be used in conjunction to filter them (including some based on assembled haplotype consistency, etc). It would be nice to be able to combine these signals (reference sequence context as well as annotations in our VCF) to filter variants using a supervised learning approach. I am interested in using CNNVariantWriteTensors, CNNVariantTrain, and CNNScoreVariants for this task, but I'm not sure that it's even possible. Are there design considerations that fundamentally make these tools incompatible with non-illumina sequencing technologies? Further, our output VCF lacks most of the annotations specified in GATK best practices and a lot of those best practice annotations are geared toward Illumina reads. I think a lot of those annotations would not be good features for PacBio reads, if I were to just plug my data into VariantAnnotator to fill in annotations. We would be especially interested in leveraging custom annotations that are long-read specific. Would it be possible for us to define our own annotation set to use with these tools?