Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Is the filtered snp.vcf file right?

JIAGEHAOJIAGEHAO Posts: 13Member
edited January 2013 in Ask the team

Dear, GATK team, I have done raw snp and indel calling with UnifiedGenotyper following the command line below.

java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper -I ERR031029.marked.realigned.fixed.recal.bam -I ERR031030.marked.realigned.fixed.recal.bam -D dbsnp_135.hg19.vcf -o ERR031030.raw.snps.indels.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000

After that, I did snp filteration using the following command lines.

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.snpsonly.vcf -selectType SNP

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants --variant ERR031030.raw.snps.indels.vcf -o ERR031030.indelsonly.vcf -selectType INDEL

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T VariantRecalibrator -R ucsc.hg19.fasta -input ERR031030.snpsonly.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_phase1.indels.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an MQ -mode SNP -recalFile ERR031030.snp.recal.vcf -tranchesFile ERR031030.snp.tranches.vcf -rscriptFile ERR031030.plots.R

java -Xmx8g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ApplyRecalibration -input ERR031030.snpsonly.vcf -tranchesFile ERR031030.snp.tranches.vcf -recalFile ERR031030.snp.recal.vcf -o ERR031030.snps.filtered.vcf

java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration --variant ERR031030.snps.filtered.vcf -o ERR031030.final.filtered.vcf --filterName "Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5"  --filterExpression "HaplotypeScore > 13.0"

The filtered snp.vcf file came up, however, it seems it contains some problem.

chrM    311     .       T       C       429.19  Nov28filters **_&& QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00   AC=1;AF=0.250;AN=4;BaseQRankSum=-13.010;DP=2000;Dels=0.00;FS=50.500;HaplotypeScore=382.2016;MLEAC=1;MLEAF=0.250;MQ=50.86;MQ0=0;MQRankSum=1.458;QD=0.43;ReadPosRankSum=-10.687;VQSLOD=-6.143e+02;culprit=HaplotypeScore  GT:AD:DP:GQ:PL  0/0:634,353:949:99:0,232,7697   0/1:463,521:945:99:459,0,4190
chrM    410     .       A       T       64750.20        PASS    AC=4;AF=1.00;AN=4;DP=2000;Dels=0.00;FS=0.000;HaplotypeScore=7.3762;MLEAC=4;MLEAF=1.00;MQ=56.04;MQ0=0;QD=32.38;VQSLOD=2.27;culprit=HaplotypeScore        GT:AD:DP:GQ:PL  1/1:0,998:998:99:32010,2926,0   1/1:0,999:999:99:32767,2912,0
chrM    711     .       G       A       62989.20        PASS    AC=4;AF=1.00;AN=4;BaseQRankSum=2.500;DP=2000;Dels=0.00;FS=3.751;HaplotypeScore=8.7084;MLEAC=4;MLEAF=1.00;MQ=56.74;MQ0=1;MQRankSum=-0.107;QD=31.49;ReadPosRankSum=-2.169;VQSLOD=2.46;culprit=HaplotypeScore
      GT:AD:DP:GQ:PL  1/1:0,998:972:99:30899,2808,0   1/1:3,997:972:99:32117,2830,0
chrM    1121    .       T       C       16719.20        Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5;VQSRTrancheSNP99.90to100.00   AC=4;AF=1.00;AN=4;BaseQRankSum=-0.239;DP=2000;Dels=0.00;FS=2.141;HaplotypeScore=22.9003;MLEAC=4;MLEAF=1.00;MQ=21.32;MQ0=703;MQRankSum=-1.627;QD=8.36;ReadPosRankSum=-0.027;VQSLOD=-4.195e+00;culprit=HaplotypeScore     GT:AD:DP:GQ:PL  1/1:3,985:986:99:9547,976,0     1/1:4,983:983:99:7199,739,0
chrM    2489    .       A       C       34.19   LowQual;Nov28filters && QD < 2.0 && ReadPosRankSum < -8.0 && MQ < 40.0 && FS > 60.0 && MQRandkSum < -12.5       AC=1;AF=0.250;AN=4;BaseQRankSum=-17.321;DP=2000;Dels=0.00;FS=180.208;HaplotypeScore=18.7245;MLEAC=1;MLEAF=0.250;MQ=46.52;MQ0=31;MQRankSum=3.365;QD=0.03;ReadPosRankSum=-4.198   GT:AD:DP:GQ:PL  0/1:278,719:950:64:64,0,4623    0/0:309,688:950:99:0,263,6065

For the filter option, most of the filtered snps show Nov28filters rather than PASS or LowQual, what's wrong with that, Are there some problems with my command lines? Thank you so much for your reply.

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    You're using the --filterName argument incorrectly.

    Geraldine Van der Auwera, PhD

  • JIAGEHAOJIAGEHAO Posts: 13Member
    edited January 2013

    I changed my command line to

    java -Xmx16g -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T VariantFiltration --variant ERR031030.snps.filtered.vcf -o ERR031030.final.filtered.vcf --filterName "Nov09filters" --filterExpression "HaplotypeScore > 13.0 || QD < 2.0 || ReadPosRankSum < -8.0 || MQ < 40.0 || FS > 60.0 || MQRandkSum < -12.5".
    

    The similar problem still came up.

    chrM    146     rs72619361      T       C       51492.20        Nov09filters;VQSRTrancheSNP99.00to99.90 AC=4;AF=1.00;AN=4;BaseQRankSum=-2.374;DB;DP=1999;Dels=0.00;FS=6.899;HaplotypeScore=34.2344;MLEAC=4;MLEAF=1.00;MQ=55.78;MQ0=0;MQRankSum=-0.977;QD=25.76;ReadPosRankSum=1.420;VQSLOD=-1.045e+00;culprit=HaplotypeScore    GT:AD:DP:GQ:PL  1/1:3,996:995:99:25645,2679,0   1/1:5,993:992:99:25874,2623,0
    

    I don't know why? is there anything wrong with my command line? Thank you so much for your reply.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    I'm not sure what you think is wrong here. You applied a filter, this variant was filtered out based on one of the parameters, so in the VCF it says which filter the variant failed and it tells you what parameter was the culprit. That's the whole point of specifying a filter name, so that if you apply several filters you know which one failed the variant.

    Geraldine Van der Auwera, PhD

  • JIAGEHAOJIAGEHAO Posts: 13Member

    Ok, in the final filtered vcf file, there are some variants with pass or lowquality, but most of them show "Nov09filters", does that mean they are filtered out? I didn't read this information from the guide. Thank you for your reply.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    It depends what you do next with your file. Having this filter annotation means you can choose to work with specific subsets of your variants. For example, you can decide to take a closer look at variants that pass every filter, or that are marked as low quality, or that failed your named filter. Nothing is truly "filtered out" in the sense that they are all still there in the file, but they have different annotations that you can use to manipulate them differentially.

    Geraldine Van der Auwera, PhD

  • JIAGEHAOJIAGEHAO Posts: 13Member

    Ok, I understand now. Thank you so much.

Sign In or Register to comment.