VQSR tag in FILTER is false positive variant ?

Hi.
I wonder whether after VQSR , the variants which have VQSR tag in FILTER column filter out, including VQSRTrancheINDEL99.00 to 99.90, VQSRTrancheINDEL99.90 to 100.00+, VQSRTrancheINDEL99.90 to 100.00, VQSRTrancheSNP99.90 to 100.00, VQSRTrancheSNP99.90 to 100.00+? These means are false positive variants?
and What do 100.00+ mean?
Can I have Only "PASS" variants tin FILTER tab as true positve variant ?

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Sunhye
    Hi,

    Can you tell me what version of GATK you are using and the exact commands you ran in the VQSR step?

    Also, please post some example records from your output VCF.

    Thanks,
    Sheila

  • SunhyeSunhye KoreaMember

    Hi @Sheila !
    My GATK version is 3.4-46-gbc02625.

    My commands is,

    #VQSR
    java -Xmx64g -jar ${GATK} -R ${REF} -T VariantRecalibrator -input project.raw.joint.chr${chr}.vcf \
    -recalFile project.joint.chr${chr}.recalibrate_SNP.recal \
    -tranchesFile project.joint.chr${chr}.recalibrate_SNP.tranches \
    -rscriptFile project.chr${chr}.recalibrate_SNP_plots.R -nt 32 -L ${chr} \
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ${bundle}/hapmap_3.3.b37.vcf \
    -resource:omni,known=false,training=true,truth=true,prior=12.0 ${bundle}/1000G_omni2.5.b37.vcf \
    -resource:1000G,known=false,training=true,truth=false,prior=10.0 ${bundle}/1000G_phase1.snps.high_confidence.b37.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${bundle}/dbsnp_138.b37.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff -mode SNP \
    --log_to_file chr${chr}.vqsr.snp.log
    
    java -Xmx64g -jar ${GATK} -R ${REF} -T ApplyRecalibration \
    -input project.raw.joint.chr${chr}.vcf -mode SNP --ts_filter_level 99.5 \
    -recalFile project.joint.chr${chr}.recalibrate_SNP.recal -tranchesFile project.joint.chr${chr}.recalibrate_SNP.tranches \
    -o project.joint.raw.chr${chr}.recalibrated_snps.vcf -nt 32 --log_to_file chr${chr}.apply.recal.snp.log
    
    java -Xmx64g -jar ${GATK} -R ${REF} -T VariantRecalibrator \
    -input project.joint.raw.chr${chr}.recalibrated_snps.vcf -recalFile project.joint.chr${chr}.recalibrate_INDEL.recal \
    -tranchesFile project.joint.chr${chr}.recalibrate_INDEL.tranches -rscriptFile project.chr${chr}.recalibrate_INDEL_plots.R -nt 32 \
    --maxGaussians 4 -L ${chr} \
    -resource:mills,known=true,training=true,truth=true,prior=12.0 ${bundle}/Mills_and_1000G_gold_standard.indels.b37.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${bundle}/dbsnp_138.b37.vcf \
    -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -an InbreedingCoeff -mode INDEL \
    --log_to_file vqsr.indel.log
    
    
    java -Xmx64g -jar ${GATK} -R ${REF} -T ApplyRecalibration \
    -input project.joint.raw.chr${chr}.recalibrated_snps.vcf -mode INDEL --ts_filter_level 99.0 \
    -recalFile project.joint.chr${chr}.recalibrate_INDEL.recal -tranchesFile project.joint.chr${chr}.recalibrate_INDEL.tranches -nt 32 \
    -o project.joint.chr${chr}.recalibrated_snps.indel.vcf --log_to_file chr${chr}.apply.recal.indel.log
    

    Finally, vcf recode is,

    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FILTER=<ID=VQSRTrancheINDEL99.00to99.90,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -13.6297 <= x < -1.1684">
    ##FILTER=<ID=VQSRTrancheINDEL99.90to100.00+,Description="Truth sensitivity tranche level for INDEL model at VQS Lod < -39838.5642">
    ##FILTER=<ID=VQSRTrancheINDEL99.90to100.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -39838.5642 <= x < -13.6297">
    ##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -87822.7588">
    ##FILTER=<ID=VQSRTrancheSNP99.90to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -87822.7588 <= x < -19.919">
    
    
    4       17106   rs140491969     G       GC      2417.67 VQSRTrancheINDEL99.00to99.90    AC=26;AF=1.00;AN=26;DB;DP=135;FS=0.000;InbreedingCoeff=-0.0321;MLEAC=26;MLEAF=1.00;MQ=29.07;NDA=1;QD=32.93;SOR=6.963;VQSLOD=-4.168e+00;culprit=SOR
    4       17154   rs2227185       T       C       180.54  PASS    AC=5;AF=0.040;AN=124;BaseQRankSum=0.322;ClippingRankSum=0.852;DB;DP=1128;FS=17.588;InbreedingCoeff=-0.0580;MLEAC=5;MLEAF=0.040;MQ=28.95;MQRankSum=-1.070e-01;NDA=1;QD=2.15;ReadPosRankSum=0.488;SOR=3.867;VQSLOD=-6.288e+00;culprit=MQ
    4       17309   rs6599373       G       A       145.63  PASS    AC=7;AF=0.057;AN=122;BaseQRankSum=-9.100e-02;ClippingRankSum=0.742;DB;DP=780;FS=0.000;InbreedingCoeff=-0.1062;MLEAC=6;MLEAF=0.049;MQ=32.78;MQRankSum=-1.085e+00;NDA=1;QD=1.21;ReadPosRankSum=0.271;SOR=0.031;VQSLOD=-3.902e+00;culprit=MQ
    4       17342   rs2097490       G       A       511.77  PASS    AC=10;AF=0.833;AN=12;BaseQRankSum=0.736;ClippingRankSum=0.736;DB;DP=20;FS=0.000;MLEAC=10;MLEAF=0.833;MQ=33.05;MQRankSum=0.736;NDA=1;QD=25.59;ReadPosRankSum=1.03;SOR=3.056;VQSLOD=-3.596e+00;culprit=DP
    4       17345   rs140188117     C       CG      133.61  VQSRTrancheINDEL99.00to99.90    AC=2;AF=1.00;AN=2;DB;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=21.00;NDA=1;NEGATIVE_TRAIN_SITE;QD=30.83;SOR=2.303;VQSLOD=-1.420e+00;culprit=FS
    4       17635   rs200030318     G       T       5162.72 VQSRTrancheSNP99.90to100.00     AC=34;AF=0.283;AN=120;BaseQRankSum=-2.400e-01;ClippingRankSum=0.230;DB;DP=1422;FS=53.177;InbreedingCoeff=-0.4573;MLEAC=37;MLEAF=0.308;MQ=31.45;MQRankSum=-1.109e+00;NDA=1;QD=5.31;ReadPosRankSum=-6.650e-01;SOR=8.958;VQSLOD=-3.032e+01;culprit=SOR
    (END)
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Sunhye
    Hi,

    Those VQSLOD score do not look good. Can you post the tranche plots and recalibration plots?

    Thanks,
    Sheila

  • SunhyeSunhye KoreaMember

    Hi @Sheila
    I'm so sorry my late reply.
    For chr4.recal.vcf, I attach files about the tranche plots and recalibration plots.
    Could you check these files?

    And I wonder that about variant filtering, VQS LOD value should use than than the presence or absence of the VQSR tag in filter colum to filter variant?
    If so, what is VQS LOD's range to decide filtering ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Sunhye
    Hi,

    Sorry for the late response. Can you tell me more about your samples and how you ran VQSR? How many samples do you have and are they whole genome or whole exome?

    Thanks,
    Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @Sunhye I'm not sure I will be of much help, but here goes. I noticed this part of your command:

    -input project.joint.raw.chr${chr}.recalibrated_snps.vcf
    

    Preferably you should run VR across all chromosomes simultaneously/concurrently.

    @Sunhye said:
    And I wonder that about variant filtering, VQS LOD value should use than than the presence or absence of the VQSR tag in filter colum to filter variant?
    If so, what is VQS LOD's range to decide filtering ?

    You should filter on truth sensitivity tranches; not on VQSLOD values.

    @Sunhye said:
    and What do 100.00+ mean?

    It is described in your VCF header, which VQSLOD values this corresponds to:

    ##FILTER=<ID=VQSRTrancheSNP99.90to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -87822.7588">
    

    Can I have Only "PASS" variants tin FILTER tab as true positve variant ?

    It depends what TS threshold you use, when you run ApplyRecalibration.

  • SunhyeSunhye KoreaMember

    Hi @Sheila .
    My samples are 62 individuals and whole genome sequencing.
    And How to run VQSR is followed by GATK best practice.
    For SNP,

    java -Xmx64g -jar ${GATK} -R ${REF} -T VariantRecalibrator -input project.raw.joint.chr${chr}.vcf \
    -recalFile project.joint.chr${chr}.recalibrate_SNP.recal \
    -tranchesFile project.joint.chr${chr}.recalibrate_SNP.tranches \
    -rscriptFile project.chr${chr}.recalibrate_SNP_plots.R -nt 32 -L ${chr} \
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ${bundle}/hapmap_3.3.b37.vcf \
    -resource:omni,known=false,training=true,truth=true,prior=12.0 ${bundle}/1000G_omni2.5.b37.vcf \
    -resource:1000G,known=false,training=true,truth=false,prior=10.0 ${bundle}/1000G_phase1.snps.high_confidence.b37.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${bundle}/dbsnp_138.b37.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff -mode SNP \
    

    For Indel,

    java -Xmx64g -jar ${GATK} -R ${REF} -T VariantRecalibrator \
    -input project.joint.raw.chr${chr}.recalibrated_snps.vcf -recalFile project.joint.chr${chr}.recalibrate_INDEL.recal \
    -tranchesFile project.joint.chr${chr}.recalibrate_INDEL.tranches -rscriptFile project.chr${chr}.recalibrate_INDEL_plots.R -nt 32 \
    --maxGaussians 4 -L ${chr} \
    -resource:mills,known=true,training=true,truth=true,prior=12.0 ${bundle}/Mills_and_1000G_gold_standard.indels.b37.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ${bundle}/dbsnp_138.b37.vcf \
    -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -an InbreedingCoeff -mode INDEL \
    --log_to_file vqsr.indel.log
    
  • SunhyeSunhye KoreaMember

    Hi @tommycarstensen !
    Thanks for your detailed comment!
    Your reply are very helpful to me!

Sign In or Register to comment.