Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Hard filtering

SystemSystem Administrator admin
This discussion was created from comments split from: New to the forum? Ask your questions here!.

Comments

  • Rita_SRita_S Member

    Hi there GATK team!

    I'm new in this field. I have a question about the strategy that I should adopt. I have 65 human WES, but it's not a simple case of cases vs control because I have different types of cases (different phenotypes) that I want to analyze in separate groups. I don’t have 30 samples of each phenotype, so I don’t know if I should do Hard Filtering or VQSR.
    What would be the right way to go?

    Thanks so much in advance!
    Rita

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Rita_S

    We recommend that you have at least 30 exomes for VQSR. If you have fewer than that then we recommend hard filtering ad you can find more info on that here: https://software.broadinstitute.org/gatk/documentation/article.php?id=2806.

    If for some phenotype it is on the borderline of that threshold then it will be a more of a judgment call. Please try it and let me know if you face any issues.

    Regards
    Bhanu

  • Rita_SRita_S Member

    Because the Joint Genotyping should be separate for each phenotype right?

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Rita_S

    You can do joint genotyping on your entire set of samples. Here is a doc that gives you more info on why you should do join genotyping.

    Regards
    Bhanu

  • Rita_SRita_S Member

    Hi @bhanuGandham

    Sorry, I don't think I'm explaining my doubt correctly.
    If I do Joint Genotyping on my entire set of samples, can I then separate then in the different subsets (different phenotypes)? Or the Joint Genotyping should be for each subset? Because I’ll obtain a multi-sample vcf, but the filter values correspond to all samples, not to the subsets.

    Thank you,
    Rita

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
    edited December 3

    Hi @Rita_S

    Are you working on somatic or germline variants?
    If it is somatic, then we recommend you do not do joint genotyping. if it is germline, you can find info in this thread.

    Regards
    Bhanu

  • Rita_SRita_S Member
    Hi
    I'm working with germline. I'll give you an example: I have 14 cases of phenotype a, 8 cases of phenotype b, 4 cases of phenotype c and 5 controls. Should I do joint genotyping for all the 31 samples together -and if so how can I analyse them separatly after?- or can I do it for each subset (for the 14 cases a +controls, for the 8 cases b + controls and for the 4 cases c + controls)?

    Or should I just do Hard Filtering?

    Thanks,
    Rita
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Rita_S

    In that case, you should do joint genotyping for all the 31 samples together(as explained in this thread) and then use SelectVariants to extract one or more samples from a callset based on either a complete sample name or a pattern match.

    Regards
    Bhanu

  • Rita_SRita_S Member
    Ok, thank you so much. And sorry for taking so much of your time.

    I'll try and do that.

    Regards,
    Rita
  • Rita_SRita_S Member
    Hi
    I have another question related to the matter.

    After the GenotypeGVCF, it is the VQSR and then the SelectVariant step, or should it be the other way (Selectvariant and then VQSR)?

    Thanks,
    Rita
  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Rita_S

    VQSR is used to prepare the vcf file for the filtering process, while SelectVariants is used to filter the variants. Hence, you should use VQSR before SelectVariants.

    Regards
    Bhanu

Sign In or Register to comment.