It looks like you're new here. If you want to get involved, click one of these buttons!
I ran the same sample through a pipeline using GATK twice and received different variants. I am trying to understand the reason behind this. My samples are from a MiSeq/capture kit run and downsampling could be one reason (given in one scenario that variant is called and in other it isn't) the variant is called at 32% when looked into the .bam files.
As I understand the UnifiedGenotyper downsamples my dataset randomly to 250, so I played around with -dcov parameter
But setting -dt to NONE could be computationally exhaustive for a big sample set. Is there an identifiable reason to why this is happening..?
Curious..!
Answers
Differences in calls can indeed be explained by downsampling. This usually affects marginal, low-confidence calls. If that's your case it probably doesn't matter because those calls would get filtered out in the next step. If that's not your case, can you tell us more about these variant calls? What are their properties?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
1 • Off Topic Disagree 1Agree Like WTF •Taking an example of a variant at chr4, its the second base in the codon, the reference reports it as T (and 67% alleles that map are also T) while the variant call is G at 32%. Mapping quality of both the variant and ref allele are around 150 and base phred quality for the variant call ranges from 25 to 29 while its 37 for the allele reported same as the reference. Total count of the bases at this position are 10056. Still capturing a variant at -dcov 250 and not getting it at -dcov 1000 looks strange..
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hmm. Could you please post your command line and the actual lines in the VCF output for the variant?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I ran this script multiple times to find whether the chromosome of interest (in bold) was called or not. I've pasted results of two such runs, one where it isn't and the second where it is called.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Well, nothing really stands out but I notice you're running version 1.6. I would strongly recommend you upgrade to the latest version to take advantage of the latest improvements we've made to the UG (including downsampling).
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Just looking at the call speaks volumes. Notice the QUAL score of the records around the one in question; they are all extremely high. But the QUAL for your record is just barely over the calling threshold. Once you run VQSR this record is absolutely, positively going to get filtered out (the QD is an infinitesimally small 0.16). This is what we mean when we say that the differences are marginal and make no practical differences.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Another thing to note, when i use
I don't get as many variant calls and this variant
is not reported atleast not in the few multiple runs that I did.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Did you see my previous comment? This is ultimately not novel discussion and has already been addressed multiple times on this forum...
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I did see your post, thanks for pointing out the QD score. The post is not to alarm or trigger any novelty, my focus is to understand the tool better and implement different thresholds, such that it calls the same variants everytime. I did not see posts on downsampling revolving around different values calling different variants, so I went ahead and made one, please feel free to get rid of this it has not yeilded a lot of feedback anyways.
Though I'd like to point out here, that VQSR was run both times and I ran the exact same data twice and in one of the runs, it reported this variant. Hence I went back to look at each step to identify if I could, why there was a difference.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Are you saying that this site was not filtered out via VQSR? If that is the case, then there is a problem. But you should not be comparing the raw calls between 2 different runs; rather you need to be assessing whether the filtered call sets are the same.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •These are the filtered results, any insight? The first one calls this variant, the second doesn't.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I think perhaps you need to look over the VCF specification. In particular, it is critical that you understand what the FILTER column is used for and what it means when the value there is not PASS. Good luck!
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •