Haplotype caller for RNA-seq

thkitapcithkitapci Los AngelesMember

I am trying to call SNPs from RNA-seq data. The data that I have is a pooled sample (from the larvae of shellfish 1000s of larvae pooled together to get enough RNA) I have 6 of those samples. Can I use GATK to call SNPs in these pooled samples ?



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It depends if you expect the larvae to be clonal or not. If yes then you can proceed with the RNAseq workflow as documented. If not then it's more complicated because the pooled population is like a polyploid organism, and 1000s seems like a lot to model properly.

  • thkitapcithkitapci Los AngelesMember

    Dear Geraldine Van der Auwera,
    Larvae is not clonal and actually it is expected to have a high degree of polymorphism so yes this pooled population should be modeled similar to a polyploid organism. Is there any recommended way to use GATK in this case ?

    Thanks a lot
    Best Regards
    T. Hamdi Kitapci

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Then you need to estimate the effective ploidy of this pooled population, and input it with the -ploidy argument. The difficulty is that due to inefficiencies in the current code, our tools cannot process ploidies above ~20. Beyond that number, processing performance becomes extremely slow. We plan to fix this in a future version but in the meantime, your ability to model the ploidy of your population may be limited.

