This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Is possible to use GATK in a couple dozen of targeting sequences but thousands of samples?
I amplified ~50 target genes in plants, half of them are chloroplast markers (haploid) and the other half are nuclear markers (diploid). I am wondering if it is possible to use the pipelines in GATK to process these kind of data. I have used a LIST of these genes in fasta format as my reference sequences to create a bam alignment (map to ref). I am curious if this will be a good approach or if I should concatenate the list of target genes. Additionally, the chloroplast markers have important variation in the homopolymers whereas the nuclear markers do not have indels, should I split these data for the calling variants steps? I would appreciate any feedback.