VariantRecalibrator, creating a truth data set
Hello community, I am working with yeast and I am doing the VariantRecalibrator step, as I dont have a truth data set I want to "filter" my initial round of raw SNP in order to have the highest quality score SNP as the gatk team suggest.
1) I was wondering if you have any suggestion about the parameters of filtration...
I am working with each strain as different species (WGS), so I have good coverage (80X) but only one "Lane"
I tried with:
java -Xmx4g -jar GenomeAnalysisTK.jar -R S288c.fasta -T VariantFiltration --variant $1.raw.vcf --filterExpression "QD<2.0 || MQ<45.0 || FS>60 || MQEankSum< -12.5 || ReadPosRankSum<-8.0 " --filterName "hardtovalidate" -o $1.filt.vcf
to remove after the LowQual and hardtovalidate snps, that make sense? thanks for your help!
2) Then after, I would do the VariantRecalibrator, but I will have only one truth set, can I use -mode both, or I should try to obtain a truth data set of indels and do the VQSR for SNP and Indels separately? What do you think?
java -Xmx4g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R ncbi_S288c.fasta -input $1.raw.vcf -recalFile $1.raw.recal -tranchesFile $1.raw.tranches -resource:filtered,known=false,training=true,truth=true,prior=15.0 $1.truth.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an DP **-mode both**