We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Is possible to use GATK in a couple dozen of targeting sequences but thousands of samples?


I amplified ~50 target genes in plants, half of them are chloroplast markers (haploid) and the other half are nuclear markers (diploid). I am wondering if it is possible to use the pipelines in GATK to process these kind of data. I have used a LIST of these genes in fasta format as my reference sequences to create a bam alignment (map to ref). I am curious if this will be a good approach or if I should concatenate the list of target genes. Additionally, the chloroplast markers have important variation in the homopolymers whereas the nuclear markers do not have indels, should I split these data for the calling variants steps? I would appreciate any feedback.


Best Answer


  • ryanmidryanmid Member

    Thank you for your feedback!

Sign In or Register to comment.