SNP calling for Cell lines - how does the ploidy affect HC

JulsJuls Member
edited November 2017 in Ask the GATK team

Hi all,

I am calling SNPs in various immortalised cell lines, which are known to be very instable - hence the ploidy is not known. Generally it should be diploid. So my question is - what can happen if the ploidy is not correct? Would HC miss SNPs? I see a relatively low overlap of common SNPs between two related cell lines and I want to make sure this low overlap is indeed there.

Thank you in advance.

Tagged:

Best Answers

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @Juls
    Hi,

    Perhaps the best thing is to try different ploidies (eg 2,3,4) and compare the outputs. HaplotypeCaller in GVCF mode is designed to be very sensitive, but the ploidy does play an important role. You can read more about the math in the Methods and Algorithms section.

    You may also try setting --standard_min_confidence_threshold_for_calling 0 to try and recover any low quality missed calls.

    -Sheila

    P.S. You may be interested in Mutect2, which is for somatic variant calling, but it does not assume any ploidy. It may be worth trying to call variants on each of your samples in tumor-only mode and seeing if you get so many differences.

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Short response: Yes it will.

  • JulsJuls Member

    well, yes but how and how much? Can it lead to many missed SNPs?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @Juls
    Hi,

    Perhaps the best thing is to try different ploidies (eg 2,3,4) and compare the outputs. HaplotypeCaller in GVCF mode is designed to be very sensitive, but the ploidy does play an important role. You can read more about the math in the Methods and Algorithms section.

    You may also try setting --standard_min_confidence_threshold_for_calling 0 to try and recover any low quality missed calls.

    -Sheila

    P.S. You may be interested in Mutect2, which is for somatic variant calling, but it does not assume any ploidy. It may be worth trying to call variants on each of your samples in tumor-only mode and seeing if you get so many differences.

  • JulsJuls Member

    Thanks @Sheila I will give this a try.
    I thought about Mutect2 but it does need known SNPs as input in the tumor-only mode (--dbsnp option) correct? Would it make sense to feed SNPs called by HC into Mutect2 if one does not have any known snps?

  • JulsJuls Member
    edited December 2017

    @Sheila
    Hi,

    Thank you so much for your help! Just to make sure: I have an immortalised cell line - non-model organism (no known snps), ploidy should be 2 but it's very heterogenous/unstable. So you would still recommend running HC and not Mutect2. Just Mutect2 in tumour-only mode to check for missed calls? And not to switch to Mutect2 for the complete analysis. May I ask why? Is it because Mutect2 is made for tumor and matched normals mainly?

    Best & many thanks!!!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited December 2017

    @Juls
    Hi,

    Yes, I had said that originally. Honestly, I don't have any experience with what you are doing, so I cannot give hard recommendations :smiley: However, I do think it is worth trying Mutect2 in tumor only mode to see what is missed. You will have to do a lot of manual review to see if the results look good or if there are lots of false positives.

    I recommend sticking with HaplotypeCaller because you are not looking to detect somatic variants. Indeed, Mutect2 is optimized to run with tumor-normal matched pairs. If you run with tumor-only mode, you can pick up some low frequency artifacts that may not be picked up with HaplotypeCaller. However, you will need to do some extra work to determine whether you believe those extra calls. I was suggesting trying Mutect2 to get a sense of just how many extra variants are called. I would also suggest trying HaplotypeCaller with other ploidies to see if that makes a difference as well.

    -Sheila

    EDIT: Are you trying to find germline or somatic mutations? I was assuming germline in my answer.

  • JulsJuls Member
    edited December 2017

    Hi @Sheila ,

    Thank you again for your help!
    Well it's an immortalised cell line for a non-model organism and I am looking for any difference - any mutation - compared to a reference genome.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Juls
    Hi,

    Okay, well just keep in mind HaplotypeCaller and Mutect2 are designed for different purposes. In your case, it may be a good idea to use both callers to determine both germline and somatic variants.

    Good luck and let us know how things go :smile:

    -Sheila

  • JulsJuls Member
    edited June 2018

    Hi @Sheila,

    I was wandering how to go about the comparison between HC and Mutect2. Hence I have a couple of questions:
    First, I have applied hard filtering on the HC results - are there similar suggestions for Mutect2 to make the results somewhat similarly filtered?
    Second, could hard filtering of the HC results eliminate low frequency variants? Should I be careful here?
    Third, I have compared the Mutect2 and HC results using vcftools vcf-compare just on the position level and I get around 90% overlap. The remaining 10% are made up by 2% variants detected by HC but not Mutect2 and 8% variants detected by Mutect2 but not HC. Note that I fed the unfiltered results for now as I am not sure how to do comparable filtering. So it appears that both callers are missing some variants in my case. Is this a surprising result?

    Thank you again for your continuous input and help!
    Best J

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited June 2018

    @Juls
    Hi J,

    First, I have applied hard filtering on the HC results - are there similar suggestions for Mutect2 to make the results somewhat similarly filtered?

    The filtering tool for Mutect2 is FilterMutectCalls.

    Second, could hard filtering of the HC results eliminate low frequency variants? Should I be careful here?

    If you used the standard filters we recommend, they are designed to be very sensitive, so you should not lose many true positives when eliminating the false positives. You may consider plotting the annotations as well to see if you can do better with the filters. Have a look at this document.

    The remaining 10% are made up by 2% variants detected by HC but not Mutect2 and 8% variants detected by Mutect2 but not HC.

    So, Mutect2 is slightly more sensitive than HaplotypeCaller. This is expected because you have not filtered the Mutect2 output. In this case, you can first try filtering the Mutect2 output and comparing. Can you tell me what your end goal is?

    -Sheila

Sign In or Register to comment.