It looks like you're new here. If you want to get involved, click one of these buttons!
Hello, I have a new sequenced genome with some samples for this specie, I would like to follow the best practices but I don't have a dbsnp or something similar, but could I use the variants from the samples as a dbsnp? for example get the variants that coincide in all my samples and use it as a dbsnp?
Thanks!
Geraldine_VdAuwera
Posts: 2,239 admin
No, you have to give the tool truth/training sets, that is not optional. The thing is, the training/truth sets aren't supposed to come from your project, they are external sets for which we know what the accuracy is. The article says what to use, right under the base commandline box. For example, for SNPs called on a WGS dataset, you have:
Whole genome shotgun experiments
SNP specific recommendations
For SNPs we use both HapMap v3.3 and the Omni chip array from the 1000 Genomes Project as training data. These datasets are available in the GATK resource bundle. Arguments for VariantRecalibrator command:
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf \
-resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.b37.vcf \
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff -an DP \
-mode SNP \
We provide very specific recommendations, you just have to pick the ones that fit depending on whether your data is WGS or Exome, and whether you are looking at SNPs or indels.
Geraldine Van der Auwera, PhD
Answers
Hi there, this is addressed in the FAQs section of the Guide.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
1 • Off Topic Disagree 1Agree Like WTF •Hi! if I am not wrong I am trying to follow this FAQ section "What VQSR training sets / arguments should I use for my specific project?" This is my command line:
The answer of this command is:
So, what should I use as a training set?
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •You have the right article, but you only passed half of the command, the "common base commandline". In addition to that, you need to also pass the part indicated like this"
What you specify there is indicated in the next part of the document.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, but as I understand I don't have a training dataset, right? So I should write something for [SPECIFY WHICH ANNOTATIONS TO USE IN MODELING] and [SPECIFY WHICH CLASS OF VARIATION TO MODEL]. Is there any information in the faq about this point?
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, so I think I can't do anything because I don't have a training/truth set coming for an external project, the problem is that I have a new genome, it is not human.
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Oh, I didn't realize you were working with non-human genomes, sorry. It is possible to generate your own training/truth sets using very high-confidence subsets of your initial calls (similar to what you may already have done to get a set of
-knownsfor for base recalibration) but it is a fairly complicated process. You may be better off with hard filtering for now; or try asking in "Ask the Community" what people normally do for non-human organisms.Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine, I am working with yeast and I am doing the VariantRecalibrator step, as I dont have a truth data set I want to "filter" my initial round of raw SNP in order to have the highest quality score SNP as you say. I was wondering if you have any suggestion about the parameters of filtration...
I am working with each strain as different organism, so I have good coverage (80X) but only one Lane
I tried with:
to remove after the LowQual and hardtovalidate snps, that make sense? thanks for your help!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •See answer here:
http://gatkforums.broadinstitute.org/discussion/comment/1794/#Comment_1794
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •