We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How to improve sensitivity

Dear staff,
I'm using GATK 4.1 to call variants on exomes data following the Best Practise:
- apply Markduplicate and base recalibrator on the bam file
- call variants using Haplotype caller
- apply GenomicsDBImport
- apply GenotypeGVCFs
- apply VariantRecalibrator
- apply HardFilter to variants
and then I compare it to the GIAB vcf using rtgtools.
However, I reach a sensibility ~90% and a precision of ~98%, instead of with UnifiedGenotyper 3.8, I obtain
a sensibility of ~94% and a precision of ~97%​, and I call more true positive with UnifiedGenotyper.
How can I improve the sensibility? Am I maybe missing something?
Thank you very much for your help.

Best Answer


  • bshifawbshifaw Member, Broadie, Moderator admin
    edited May 2019

    Hi @deni

    Please share the exact commands used in your workflow, particularly the HaplotypeCaller command.
    The results can depend on the type of data your using (e.g. organism), what type of input data you are using ?

    You may find some useful information in the Tool Documentation for HaplotypeCaller. It has a section on How HaplotypeCaller works which links to further explanation on how parameters of the tool can be adjusted to shift precision and sensitivity.

  • denideni Member
    Hi @bshifaw,
    thank you very much for your answer. The organism is the human, using as reference the HG38.
    The input data are exomes of human individuals done with different li

    The command I used are:

    cat $intervals | parallel --jobs 5 java -Djava.io.tmpdir=${tempGvcf} -Duser.country=EN -Duser.language=us -Xmx3G -jar gatk-package- HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I $bam --dbsnp esources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L {} -ERC GVCF --output ${tempGvcf}/${id}/${id}.{}.snps.raw.g.vcf --standard-min-confidence-threshold-for-calling 10.0 ;

    Then I used genotypeGVCFs

    cat wgs.hg38.intervals | parallel --jobs 5 java -Djava.io.tmpdir=${temp} -Xmx2G -jar gatk-package- GenotypeGVCFs -R Homo_sapiens_assembly38.fasta -V gendb://${tempGvcf}/{}/ -G StandardAnnotation -O ${resultpath}/{}.vcf

    and merge the files:

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package- MergeVcfs ${myshards[@]} -O ${resultpath}/complete.raw.variants.vcf

    After I apply the Variant recalibator:

    # CREATE Indels MODEL

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package- VariantRecalibrator --variant ${resultpath}/complete.raw.variants.vcf --output ${resultpath}/INDEL.recalibration --tranches-file ${resultpath}/INDEL.tranches --trust-all-polymorphic \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --use-annotation QD --use-annotation DP --use-annotation FS --use-annotation SOR --use-annotation MQRankSum --use-annotation ReadPosRankSum \
    --mode INDEL --max-gaussians 4 -R Homo_sapiens_assembly38.fasta\
    --resource mills,known=false,training=true,truth=true,prior=12:Mills_and_1000G_gold_standard.indels.hg38.vcf \
    --resource axiomPoly,known=false,training=true,truth=false,prior=10:Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2:resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L $intervals ;


    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package- VariantRecalibrator --variant ${resultpath}/complete.raw.variants.vcf --output ${resultpath}/SNPS.recalibration --tranches-file ${resultpath}/SNPS.tranches --trust-all-polymorphic \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --use-annotation DP --use-annotation QD --use-annotation FS --use-annotation SOR --use-annotation MQ --use-annotation MQRankSum --use-annotation ReadPosRankSum --mode SNP --max-gaussians 6 \
    --resource hapmap,known=false,training=true,truth=true,prior=15:hapmap_3.3.hg38.vcf \
    --resource omni,known=false,training=true,truth=true,prior=12:1000G_omni2.5.hg38.vcf \
    --resource 1000G,known=false,training=true,truth=false,prior=10:1000G_phase1.snps.high_confidence.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=7:resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L wgs.hg38.intervals ;


    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package- ApplyVQSR --output ${resultpath}/indel.recalibrated.vcf --variant ${resultpath}/complete.raw.variants.vcf --recal-file ${resultpath}/INDEL.recalibration --tranches-file ${resultpath}/INDEL.tranches \
    --truth-sensitivity-filter-level 99 --create-output-variant-index true --mode INDEL ;


    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package- ApplyVQSR --output ${resultpath}/variants.recalibrated.vcf --variant ${resultpath}/indel.recalibrated.vcf --recal-file ${resultpath}/SNPS.recalibration --tranches-file ${resultpath}/SNPS.tranches \
    --truth-sensitivity-filter-level 99 --create-output-variant-index true --mode SNP ;

    And finally apply the filtration:

    # extract SNPs from unfiltered variants
    java -Xmx3G -jar gatk-package- SelectVariants --select-type-to-include SNP --output raw_snps.vcf -V variants.recalibrated.vcf;

    java -Xmx3G -jar gatk-package- SelectVariants --select-type-to-exclude SNP --output raw_indels.vcf -V variants.recalibrated.vcf ;

    # filter SNPs
    java -Xmx3G -jar gatk-package- VariantFiltration -R Homo_sapiens_assembly38.fasta -V raw_snps.vcf \
    --filter-expression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" \
    --filter-name "Broad_SNP_filter" -O raw_filtered_snps.vcf ;

    java -Xmx3G -jar gatk-package- VariantFiltration -R Homo_sapiens_assembly38.fasta -V raw_indels.vcf \
    --filter-expression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" \
    --filter-name "Broad_indel_Filter" -O raw_filtered_indels.vcf ;

    # merge filtered variants
    lava -Xmx3G -jar gatk-package- MergeVcfs -I raw_filtered_snps.vcf -I raw_filtered_indels.vcf -O variants.filtered.vcf ;

    Thank you very much for your help.
Sign In or Register to comment.