GATK validation with Sanger sequencing

Hi,

1) Does GATK have a module that allows validation of SNP calls when Sanger reads are provided for the same individual for parts of the genome?

2) All False positive calls of GATK happen at variant sites. I.e., sites that are variant(0/1) in one individual are the sites that are incorrectly called as a SNP(0/1) in other individuals in which they are not variant(0/0).

Based on the parts of the genome that we sanger sequenced, as much as 5% of the SNP calls at "variant sites" was False positives. Does this number make sense? Or is my dataset or callset horribly wrong. Any information about the False positive rates is welcome.

(My dataset: ~10X average coverage, 50 Kb region Sanger sequenced)

Thanks,

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    1) GenotypeAndValidate should work on Sanger data.

    2) Different parts of the genome are harder to call - and to validate. In "easy" parts of the genome, you should get a TP rate of ~99%, but in harder parts of the genome 95% is not unreasonable. But it's also very important to remember that even Sanger sequencing has error modes (esp. in those harder parts) so you'll need to add error bars to the validation rates. Also, 10x coverage is still considered low coverage, so it's not surprising that there are genotyping errors (as opposed to the sites being wrong); in other words, the GATK is correctly finding that a site is polymorphic, but because of the lower coverage it isn't genotyping all of the samples correctly. This is expected.

Sign In or Register to comment.