SNP filtering help

jojo2016jojo2016 DublinMember

I used the GATK best practise guideline to call SNPs for a strain of yeast that only has one strain sequenced. I then also used the same steps to call SNPs using the raw data that was used to generate the reference and noticed that there are a few SNPs in this output. If this data was used to generate the reference i assume it should have no SNPs? Going by the results i have i want to filter my SNP data base to be more accurate. There is no know sip database for this organism but could i use the data i have from the reference to filter my new snap data? Any help would be greatly appreciated as this is my first bioinformatics project. Thank You



  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Can you please clarify what you mean by you "used the GATK Best Practice guidelines"? Where did you get the reference from? Is it de novo assembled from the reads you are using? Are you saying you should get no SNPs at all? If so, can you post some examples SNPs you got in the final VCF?


