The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Is possible to use GATK in a couple dozen of targeting sequences but thousands of samples?
I amplified ~50 target genes in plants, half of them are chloroplast markers (haploid) and the other half are nuclear markers (diploid). I am wondering if it is possible to use the pipelines in GATK to process these kind of data. I have used a LIST of these genes in fasta format as my reference sequences to create a bam alignment (map to ref). I am curious if this will be a good approach or if I should concatenate the list of target genes. Additionally, the chloroplast markers have important variation in the homopolymers whereas the nuclear markers do not have indels, should I split these data for the calling variants steps? I would appreciate any feedback.