joint SNP calling for individuals with very different coverage

Good morning,

We have sequenced the genome of 40 individuals from two species to do population genomic analyses. For the first species, one of them was sequenced to build a reference genome, at around 90X, ten others are sequenced at around 30X, and nine at 5X. For the second species, one is sequenced at 40X and the rest at 5X. Our idea was to perform a joint SNP calling (per species) without considering the downsampling of the highest sequenced individuals. Otherwise, some people suggest this can be biasing and that all samples should be downsampled to 5X to do a fair SNP calling. Intuitivelly, using the samples at high coverage is very advantageous to detect variants, and from our point of view, this will not introduce biases to the calling of the rest of samples.

Could you please give us your advise and why?

Thanks a lot in advance,

B. Martinez

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    Hi B.,

    What is the end goal of your experiment? Is it to identify the variants that are different between the two species?

    You can certainly do joint calling on all the species' samples together. This will give you greater sensitivity. However, please be aware that you will not be able to use VQSR. You will have to use hard filtering. Performing joint calling on all the samples in each species together will give you increased sensitivity, but also decreased specificity. http://gatkforums.broadinstitute.org/discussion/4150/should-i-analyze-my-samples-alone-or-together#latest


  • bmartinezbmartinez EEUUMember

    Hi Sheila,

    Thanks a lot for your answer.
    Our main goal is to contrast genomic variation patterns between species and populations, so we will focusing on variants in either of the two species. But our main question is not on how to best analyze the two species conjunctly, but on how to best combine samples - say of a single species, for simplicity- with very different coverage. Will this create some kind of bias in a joint calling? Would it be best to downsample first to equalize coverage among samples being called jointly? What would be the optimal way of calling variants and genotypes in such scenario?



  • bmartinezbmartinez EEUUMember

    Dear Geraldine,

    This answers my question, thank you very much!


