I am currently evaluating different methods to select tagSNPs for a gene. Is it possible to identify tagSNPs for a gene with GATK by scanning a given list of SNPs?
Thank you very much.
If I understand correctly, you want to test samples for variation at specific sites? If so, yes you can do that with GATK. The exact workflow to follow depends on how your data was generated. Is it whole genoms data, exome data, or gene panel data?
Thank you for your reply.
I want to use publicly available data about SNPs at one site, in order to identify tagSNPs as markers for two possible haplotypes. The workflow I had in mind, was to download a list of SNPs for this gene, convert the list into needed formats for GATK and then use GATK to determine linkage disequilibrium of the SNPs and therefore tagSNPs.
@biobio Is it perhaps an option for you to use PLINK1.9, if you are dealing with just one site? If you want to use haplotype input instead of estimating LD with the EM algorithm from unphased genotypes (and your data is in the SHAPEIT2 .haps format), then I have a Python script for calculating LD:
Oh I see, thanks for clarifying -- that's not something you would use GATK for. Then @tommycarstensen's recommendation of using PLINK is more appropriate.