It looks like you're new here. If you want to get involved, click one of these buttons!
Dear, GATK team, I have done raw snp and indel calling with UnifiedGenotyper following the command line below.
java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper -I ERR031029.marked.realigned.fixed.recal.bam -I ERR031030.marked.realigned.fixed.recal.bam -D dbsnp_135.hg19.vcf -o ERR031030.raw.snps.indels.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000
After that, I did snp filteration using the following command lines.
java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.snpsonly.vcf -selectType SNP
java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.indelsonly.vcf -selectType INDEL
java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T VariantRecalibrator -R ucsc.hg19.fasta -input ERR031030.snpsonly.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_phase1.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an MQ -mode SNP -recalFile ERR031030.snp.recal.vcf -tranchesFile ERR031030.snp.tranches.vcf -rscriptFile ERR031030.plots.R
java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ApplyRecalibration -input ERR031030.snpsonly.vcf -tranchesFile ERR031030.snp.tranches.vcf -recalFile ERR031030.snp.recal.vcf -o ERR031030.snps.filtered.vcf
java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration --variant ERR031030.snps.filtered.vcf -o ERR031030.final.filtered.vcf --filterName "Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5" --filterExpression "HaplotypeScore > 13.0"
The filtered snp.vcf file came up, however, it seems it contains some problem.
chrM 311 . T C 429.19 Nov28filters **_&& QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00 AC=1;AF=0.250;AN=4;BaseQRankSum=-13.010;DP=2000;Dels=0.00;FS=50.500;HaplotypeScore=382.2016;MLEAC=1;MLEAF=0.250;MQ=50.86;MQ0=0;MQRankSum=1.458;QD=0.43;ReadPosRankSum=-10.687;VQSLOD=-6.143e+02;culprit=HaplotypeScore GT:AD:DP:GQ:PL 0/0:634,353:949:99:0,232,7697 0/1:463,521:945:99:459,0,4190
chrM 410 . A T 64750.20 PASS AC=4;AF=1.00;AN=4;DP=2000;Dels=0.00;FS=0.000;HaplotypeScore=7.3762;MLEAC=4;MLEAF=1.00;MQ=56.04;MQ0=0;QD=32.38;VQSLOD=2.27;culprit=HaplotypeScore GT:AD:DP:GQ:PL 1/1:0,998:998:99:32010,2926,0 1/1:0,999:999:99:32767,2912,0
chrM 711 . G A 62989.20 PASS AC=4;AF=1.00;AN=4;BaseQRankSum=2.500;DP=2000;Dels=0.00;FS=3.751;HaplotypeScore=8.7084;MLEAC=4;MLEAF=1.00;MQ=56.74;MQ0=1;MQRankSum=-0.107;QD=31.49;ReadPosRankSum=-2.169;VQSLOD=2.46;culprit=HaplotypeScore
GT:AD:DP:GQ:PL 1/1:0,998:972:99:30899,2808,0 1/1:3,997:972:99:32117,2830,0
chrM 1121 . T C 16719.20 Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00 AC=4;AF=1.00;AN=4;BaseQRankSum=-0.239;DP=2000;Dels=0.00;FS=2.141;HaplotypeScore=22.9003;MLEAC=4;MLEAF=1.00;MQ=21.32;MQ0=703;MQRankSum=-1.627;QD=8.36;ReadPosRankSum=-0.027;VQSLOD=-4.195e+00;culprit=HaplotypeScore GT:AD:DP:GQ:PL 1/1:3,985:986:99:9547,976,0 1/1:4,983:983:99:7199,739,0
chrM 2489 . A C 34.19 LowQual;Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5 AC=1;AF=0.250;AN=4;BaseQRankSum=-17.321;DP=2000;Dels=0.00;FS=180.208;HaplotypeScore=18.7245;MLEAC=1;MLEAF=0.250;MQ=46.52;MQ0=31;MQRankSum=3.365;QD=0.03;ReadPosRankSum=-4.198 GT:AD:DP:GQ:PL 0/1:278,719:950:64:64,0,4623 0/0:309,688:950:99:0,263,6065
For the filter option, most of the filtered snps show Nov28filters rather than PASS or LowQual, what's wrong with that, Are there some problems with my command lines? Thank you so much for your reply.
Answers
You're using the --filterName argument incorrectly.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I changed my command line to
The similar problem still came up.
I don't know why? is there anything wrong with my command line? Thank you so much for your reply.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm not sure what you think is wrong here. You applied a filter, this variant was filtered out based on one of the parameters, so in the VCF it says which filter the variant failed and it tells you what parameter was the culprit. That's the whole point of specifying a filter name, so that if you apply several filters you know which one failed the variant.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, in the final filtered vcf file, there are some variants with pass or lowquality, but most of them show "Nov09filters", does that mean they are filtered out? I didn't read this information from the guide. Thank you for your reply.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It depends what you do next with your file. Having this filter annotation means you can choose to work with specific subsets of your variants. For example, you can decide to take a closer look at variants that pass every filter, or that are marked as low quality, or that failed your named filter. Nothing is truly "filtered out" in the sense that they are all still there in the file, but they have different annotations that you can use to manipulate them differentially.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, I understand now. Thank you so much.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •