Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VQSR filtering and dbSNP

jushjush PhillyMember
Hi,

For VQSR filtering, it's assumed that all calls made by HaplotypeCaller or MuTect2 are put into VQSR? Polymorphic sites should only be culled AFTER the vcf is annotated with VQSLOD scores, and a tranche is selected?

I'm just curious if I'm missing anything major since I'm new to WGS analysis, and I was just given a GVCF file to analyze.

My pipeline is as follows:
- Pick and genotype variants from GVCF using "GenotypeGVCFs"
- Split SNPs and INDELs with "SelectVariants"
- Run VQSR, using databases mentioned in tutorial page
- Kept tranche 99
- Merged SNP and INDEL callsets
- Annotate with "Funcotator", excluded filtered sites and used somatic database from Broad ftp
- Annotate with Annovar, dbSNPs and cosmic
- Looking at exonic mutations now... culled dbSNP sites based on allele frequency

Few questions
1. As mentioned above, is it appropriate to cull polymorphic sites after vqsr filtering and choosing a tranche?
2. I still have trouble explaining very basically what it means to choose tranche=99. I've read the tutorials, but just wanted to make sure that it's not like a reverse-percentile of highest VQSLOD scores.
3. Funcotator ran into a bunch of errors, especially with indels (which makes sense given how noisy they can be). A few of many are below. I assumed these aren't problematic since indels are difficult to pin down. Is this the case?

```
19:18:36.886 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000409333.1 for variant: chr2:169799278-169799281(CTTT* -> C): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000409333.1 Variant: [VC Unknown @ chr2:169799278-169799281 Q50.60 of type=INDEL alleles=[CTTT*, C] attr={AC=1, AF=0.500, AN=2, BaseQRankSum=0.00, ClippingRankSum=0.00, DP=11, ExcessHet=3.0103, FS=0.000, MLEAC=1, MLEAF=0.500, MQ=60.00, MQRankSum=-5.500e-01, QD=10.12, RAW_MQ=39600.00, ReadPosRankSum=-5.500e-01, SOR=1.022, VQSLOD=15.32, culprit=MQ} GT=GT:AD:DP:GQ:PL 0/1:3,2:7:58:58,0,155 filters=
19:18:36.887 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000409333.1 for problem variant: chr2:169799278-169799281(CTTT* -> C)
19:18:41.185 INFO ProgressMeter - chr2:171366124 14.9 561000 37768.3
19:18:51.849 INFO ProgressMeter - chr2:177094585 15.0 569000 37853.9
19:18:55.852 ERROR GencodeFuncotationFactory - Problem creating a GencodeFuncotation on transcript ENST00000392505.6 for variant: chr2:178395772-178395774(TAA* -> T): Variant overlaps transcript but is not completely contained within it. Funcotator cannot currently handle this case. Transcript: ENST00000392505.6 Variant: [VC Unknown @ chr2:178395772-178395774 Q37.28 of type=INDEL alleles=[TAA*, T] attr={AC=2, AF=1.00, AN=2, DP=11, ExcessHet=3.0103, FS=0.000, MLEAC=1, MLEAF=0.500, MQ=57.56, QD=18.64, RAW_MQ=36441.00, SOR=2.303, VQSLOD=0.979, culprit=MQ} GT=GT:AD:DP:GQ:PL
///
19:22:23.536 WARN GencodeFuncotationFactory - Creating default GencodeFuncotation on transcript ENST00000415684.5 for problem variant: chr3:12533659-12533660(CA* -> C)
19:22:26.673 WARN FuncotatorUtils - createAminoAcidSequence given a coding sequence of length not divisible by 3. Dropping bases from the end: 1 (size=400, ref allele: T)
19:22:26.673 WARN FuncotatorUtils - createAminoAcidSequence given a coding sequence of length not divisible by 3. Dropping bases from the end: 1 (size=400, alt allele: C)
```

Thanks for any help!

Answers

Sign In or Register to comment.