SNP/ INDEL calling in Bisulfite data

marakebymarakeby Posts: 3Member

I have Bisulfite- treated sequence mapped using Bismark and Bowtie2 and I'd like to call SNPs and INDELs from it. I have used Bis-SNP to call SNPs but it doesn't call indels , can I use GATK to call indels from the mapped data? Do u have any support to Bisulfite data?
Another question please, the data is a mix from 6 different people do u have any support fro pooled data?
Thanks for your help.

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,938Administrator, Dev admin

    Hi there,

    The UnifiedGenotyper can handle pooled data, yes. Have a look in the tech doc at the ploidy argument.

    However, we have no experience with bisulfite data, so whether you can process it through GATK will depend on how the data is encoded. If the methylated bases are represented by something other than ACTG, then you will run into trouble. But it they are in separate tags it should be okay.

    Good luck and please let us know how it goes!

    Geraldine Van der Auwera, PhD

  • marakebymarakeby Posts: 3Member

    Many thanks for your help.
    Bisulfite data is a normal sequence but with the unmethylated 'C' base converted to 'T', so the real sequence is not observed. I think I have to do some modifications for the code. I was wondering where I can find an explanation of your indel detection model. I have found some presentations on you site but I could not find good explanation.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,938Administrator, Dev admin

    Ah, I see. Yes, I think you'll need to make some modifications then.

    What kind of details are you trying to find out about the indel discovery model? Mathematical, or more functional? Feel free to look at the code itself if you're comfortable with that, if you want details of how the program operates. It is freely accessible on our github repository.

    Geraldine Van der Auwera, PhD

  • marakebymarakeby Posts: 3Member

    I have already downloaded the code but I am searching for the statistical model used in your system. It usually helps to look at the model before digging into the code. I have seen that your model is inspired by Dindel but I'd like to see if there are any differences.
    Thanks for your continuous help

  • LonginottoLonginotto FreiburgPosts: 35Member

    I suppose you could write a little script to convert instances where the reference base is a C back to a C if some or all reads suggest a T. You would lose the ability to call hetero/homozygous C -> T SNPs, but theres no way around that really.

Sign In or Register to comment.