Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to improve sensitivity

denideni Member
Dear staff,
I'm using GATK 4.1 to call variants on exomes data following the Best Practise:
- apply Markduplicate and base recalibrator on the bam file
- call variants using Haplotype caller
- apply GenomicsDBImport
- apply GenotypeGVCFs
- apply VariantRecalibrator
- apply HardFilter to variants
and then I compare it to the GIAB vcf using rtgtools.
However, I reach a sensibility ~90% and a precision of ~98%, instead of with UnifiedGenotyper 3.8, I obtain
a sensibility of ~94% and a precision of ~97%​, and I call more true positive with UnifiedGenotyper.
How can I improve the sensibility? Am I maybe missing something?
Thank you very much for your help.

Best Answer

Answers

  • bshifawbshifaw moonMember, Broadie, Moderator admin
    edited May 14

    Hi @deni

    Please share the exact commands used in your workflow, particularly the HaplotypeCaller command.
    The results can depend on the type of data your using (e.g. organism), what type of input data you are using ?

    You may find some useful information in the Tool Documentation for HaplotypeCaller. It has a section on How HaplotypeCaller works which links to further explanation on how parameters of the tool can be adjusted to shift precision and sensitivity.

  • denideni Member
    Hi @bshifaw,
    thank you very much for your answer. The organism is the human, using as reference the HG38.
    The input data are exomes of human individuals done with different li

    The command I used are:

    cat $intervals | parallel --jobs 5 java -Djava.io.tmpdir=${tempGvcf} -Duser.country=EN -Duser.language=us -Xmx3G -jar gatk-package-4.1.0.0-local.jar HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I $bam --dbsnp esources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L {} -ERC GVCF --output ${tempGvcf}/${id}/${id}.{}.snps.raw.g.vcf --standard-min-confidence-threshold-for-calling 10.0 ;

    Then I used genotypeGVCFs

    cat wgs.hg38.intervals | parallel --jobs 5 java -Djava.io.tmpdir=${temp} -Xmx2G -jar gatk-package-4.1.0.0-local.jar GenotypeGVCFs -R Homo_sapiens_assembly38.fasta -V gendb://${tempGvcf}/{}/ -G StandardAnnotation -O ${resultpath}/{}.vcf

    and merge the files:

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package-4.1.0.0-local.jar MergeVcfs ${myshards[@]} -O ${resultpath}/complete.raw.variants.vcf

    After I apply the Variant recalibator:

    #
    # CREATE Indels MODEL
    #


    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package-4.1.0.0-local.jar VariantRecalibrator --variant ${resultpath}/complete.raw.variants.vcf --output ${resultpath}/INDEL.recalibration --tranches-file ${resultpath}/INDEL.tranches --trust-all-polymorphic \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --use-annotation QD --use-annotation DP --use-annotation FS --use-annotation SOR --use-annotation MQRankSum --use-annotation ReadPosRankSum \
    --mode INDEL --max-gaussians 4 -R Homo_sapiens_assembly38.fasta\
    --resource mills,known=false,training=true,truth=true,prior=12:Mills_and_1000G_gold_standard.indels.hg38.vcf \
    --resource axiomPoly,known=false,training=true,truth=false,prior=10:Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2:resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L $intervals ;


    #
    # CREATE SNP MODEL
    #

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package-4.1.0.0-local.jar VariantRecalibrator --variant ${resultpath}/complete.raw.variants.vcf --output ${resultpath}/SNPS.recalibration --tranches-file ${resultpath}/SNPS.tranches --trust-all-polymorphic \
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
    --use-annotation DP --use-annotation QD --use-annotation FS --use-annotation SOR --use-annotation MQ --use-annotation MQRankSum --use-annotation ReadPosRankSum --mode SNP --max-gaussians 6 \
    --resource hapmap,known=false,training=true,truth=true,prior=15:hapmap_3.3.hg38.vcf \
    --resource omni,known=false,training=true,truth=true,prior=12:1000G_omni2.5.hg38.vcf \
    --resource 1000G,known=false,training=true,truth=false,prior=10:1000G_phase1.snps.high_confidence.hg38.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=7:resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf -L wgs.hg38.intervals ;

    #
    # APPLY INDELS MODEL
    #

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package-4.1.0.0-local.jar ApplyVQSR --output ${resultpath}/indel.recalibrated.vcf --variant ${resultpath}/complete.raw.variants.vcf --recal-file ${resultpath}/INDEL.recalibration --tranches-file ${resultpath}/INDEL.tranches \
    --truth-sensitivity-filter-level 99 --create-output-variant-index true --mode INDEL ;

    # APPLY SNPS MODEL
    #

    java -Djava.io.tmpdir=${temp} -Xmx8G -jar gatk-package-4.1.0.0-local.jar ApplyVQSR --output ${resultpath}/variants.recalibrated.vcf --variant ${resultpath}/indel.recalibrated.vcf --recal-file ${resultpath}/SNPS.recalibration --tranches-file ${resultpath}/SNPS.tranches \
    --truth-sensitivity-filter-level 99 --create-output-variant-index true --mode SNP ;


    And finally apply the filtration:

    # extract SNPs from unfiltered variants
    java -Xmx3G -jar gatk-package-4.1.0.0-local.jar SelectVariants --select-type-to-include SNP --output raw_snps.vcf -V variants.recalibrated.vcf;

    java -Xmx3G -jar gatk-package-4.1.0.0-local.jar SelectVariants --select-type-to-exclude SNP --output raw_indels.vcf -V variants.recalibrated.vcf ;


    # filter SNPs
    java -Xmx3G -jar gatk-package-4.1.0.0-local.jar VariantFiltration -R Homo_sapiens_assembly38.fasta -V raw_snps.vcf \
    --filter-expression "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" \
    --filter-name "Broad_SNP_filter" -O raw_filtered_snps.vcf ;

    java -Xmx3G -jar gatk-package-4.1.0.0-local.jar VariantFiltration -R Homo_sapiens_assembly38.fasta -V raw_indels.vcf \
    --filter-expression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" \
    --filter-name "Broad_indel_Filter" -O raw_filtered_indels.vcf ;


    # merge filtered variants
    lava -Xmx3G -jar gatk-package-4.1.0.0-local.jar MergeVcfs -I raw_filtered_snps.vcf -I raw_filtered_indels.vcf -O variants.filtered.vcf ;

    Thank you very much for your help.
Sign In or Register to comment.