Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

-L argument

wendywendy taiwanMember

Hi all,
I only have NGS data from 20 person, the data set is too small to do variant recalibration. Therefore,I want to merge the whole exome data from 1000 Genome.

**But my NGS data had been use " -L argument " to choose the intervals of 100 gene,so should whole exome data from 1000 Genome also have to use " -L argument " to choose 100 gene before I merge the data with NGS data from 20 person? **

Thanks for help

Wendy

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Wendy,

    The problem here is that the limitations of VQSR are not directly dependent on the number of people sampled. What is important is the number of variants fed into the model. We say to add more people because this adds variants. But if you are looking at such a small target area, it is unlikely that you will get enough variants even after adding lots more people.

    If you have whole genome or exome data for your samples and just restricted your analysis to go faster, then the right thing to do is to reprocess everything, but this time include the rest of the data. It will take more time but the results will be better. You may still need to add samples from 1000 genomes.

    If not you are better off using hard filters.

  • wendywendy taiwanMember

    hmmmmmm.....actually, I still cannot understand your reply. Could you explain that in easier way??
    My question is...
    My NGS data had been use " -L argument " to choose the intervals of 100 gene,and I want to add 1000 Genome whole exome data to my NGS data. Should whole exome data from 1000 Genome also have to use " -L argument " to choose 100 gene before I merge the data with NGS data?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited October 2014

    @wendy‌

    Hi Wendy,

    For VQSR to work properly, you need to have a lot of variants, not necessarily a lot of samples. Because you only have 100 genes in your data, even if you add more samples, there may still not be enough data for VQSR to work properly.

    Do you have whole genome or whole exome data that you ran with -L to save time? If yes, then you will need to re-run the pipeline without using the -L argument. Then, you can add in the 1000 Genomes data. Even though this will take more time, you will get more accurate results. Once you have the results, you can use SelectVariants with -L to get the intervals you are interested in.

    If you do not have whole genome or whole exome data, you will need to use hard filtering. Please read about hard filtering here: http://gatkforums.broadinstitute.org/discussion/2806/howto-apply-hard-filters-to-a-call-set

    I hope this helps.

    -Sheila

  • wendywendy taiwanMember
    edited October 2014

    Do you have whole genome or whole exome data that you ran with -L to save time? If yes, then you will need to re-run the pipeline without using the -L argument. Then, you can add in the 1000 Genomes data. Even though this will take more time, you will get more accurate results. Once you have the results, you can use SelectVariants with -L to get the intervals you are interested in.

    HI,
    I have whole genome data,after re-run the pipeline without using the -L argument and add in the 1000 Genomes data.
    1.About "Once you have the results......",what is the "result" means?
    2. Which steps should I use SelectVariants with -L ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @wendy‌

    Hi Wendy,

    The result is the vcf with the variants from the whole genome.

    You will run SelectVariants on that resulting vcf with the -L argument to get a vcf that contains the intervals you are interested in.

    -Sheila

  • wendywendy taiwanMember

    Hi Geraldine and Sheila,
    Many thanks for your answer!

Sign In or Register to comment.