To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

result of VQSR - negative number og novel variants

Hi, I have run VQSR on my SNP data and got negative number of novel variants. I attached the tranches file.

I am working on whole exome data and here are my commands.

java -Xmx4g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R IsaacIndex/genome.fa \
-input data_raw.vcf \
-recalFile data_snp.recal.vcf \
-tranchesFile vqsr_snp.tranches \
-nt 8 \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.hg19.sites.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.hg19.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.hg19.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \
-mode SNP \

java -Xmx3g -jar GenomeAnalysisTK.jar \
-T ApplyRecalibration \
-R IsaacIndex/genome.fa \
-input data_raw.vcf \
-tranchesFile vqsr_snp.tranches \
-recalFile data_snp.recal.vcf \
-o data_snp.vqsr.vcf \
--ts_filter_level 99.5 \
-mode SNP \

java -Xmx4g -jar GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R IsaacIndex/genome.fa \
-input data_raw.vcf \
-recalFile data_indel.recal.vcf \
-tranchesFile vqsr_indel.tranches \
-nt 8 \
--maxGaussians 4 \
-resource:mills,known=false,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.hg19.vcf \
-an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum -an InbreedingCoeff \
-mode INDEL \

java -Xmx3g -jar GenomeAnalysisTK.jar \
-T ApplyRecalibration \
-R IsaacIndex/genome.fa \
-input data_raw.vcf \
-tranchesFile vqsr_indel.tranches \
-recalFile data_indel.recal.vcf \
-o data_indel.vqsr.vcf \
--ts_filter_level 99.0 \
-mode INDEL \

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    How was the vcf of variants created? Was it with HaplotypeCaller or something else?

  • Kelly135Kelly135 KoreaMember

    When using "dbsnp_138.hg19.excluding_sites_after_129.vcf" instead, number of variants became positive values, but still it looks strange. I attached the results again.

    And Geraldine, these vcfs are created using HaplotypeCaller and here are my commands. Sorry, it's too long!

    java -jar picard-tools-1.96/SortSam.jar SORT_ORDER=coordinate VALIDATION_STRINGENCY=SILENT INPUT=sample1.sam OUTPUT=sample1.bam TMP_DIR=tmp

    java -jar picard-tools-1.96/MarkDuplicates.jar REMOVE_DUPLICATES=false CREATE_MD5_FILE=true VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true I=sample1.bam O=sample1.dup.bam METRICS_FILE=sample1.dup.metrics

    java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R IsaacIndex/genome.fa -known Mills_Devine_2hit.indels.hg19.vcf -I sample1.dup.bam -o sample1.realign.intervals

    java -jar GenomeAnalysisTK.jar -T IndelRealigner -R IsaacIndex/genome.fa -targetIntervals sample1.realign.intervals -I sample1.dup.bam -o sample1.realndup.bam

    java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R IsaacIndex/genome.fa -knownSites dbsnp_132.hg19.vcf -knownSites Mills_Devine_2hit.indels.hg19.vcf -I sample1.realndup.bam -o sample1.recal.table

    java -jar GenomeAnalysisTK.jar -T PrintReads -R IsaacIndex/genome.fa -I sample1.realndup.bam -BQSR sample1.recal.table -o sample1.recalrealndup.bam

    java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R IsaacIndex/genome.fa -knownSites Mills_Devine_2hit.indels.hg19.vcf -I sample1.recalrealndup.bam -knownSites dbsnp_132.hg19.vcf -BQSR sample1.recal.table -o sample1.recalafter.table

    java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R IsaacIndex/genome.fa -before sample1.recal.table -after sample1.recalafter.table -csv sample1.recalafter.csv -plots sample1.recalplots.pdf -l DEBUG

    java -jar picard-tools-1.96/CollectInsertSizeMetrics.jar VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=IsaacIndex/genome.fa I=sample1.recalrealndup.bam O=sample1.recalrealndup.insertSize.tsv HISTOGRAM_FILE=sample1.recalrealndup.insertSize.histo.pdf METRIC_ACCUMULATION_LEVEL=LIBRARY

    java -jar picard-tools-1.96/CollectAlignmentSummaryMetrics.jar VALIDATION_STRINGENCY=SILENT REFERENCE_SEQUENCE=IsaacIndex/genome.fa I=sample1.recalrealndup.bam O=sample1.recalrealndup.metric.alignment.tsv METRIC_ACCUMULATION_LEVEL=LIBRARY

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R IsaacIndex/genome.fa -I sample1.recalrealndup.bam -o sample1.g.vcf -ERC GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -nct 2

    # I did the same process for other samples as above, and merged them into one vcf file.

    java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R IsaacIndex/genome.fa -nt 20 --variant sample1to150.list --dbsnp dbsnp_132.hg19.vcf -o data_raw.vcf

    java -jar GenomeAnalysisTK.jar -T SelectVariants -R IsaacIndex/genome.fa -V data_raw.vcf -o raw_snp.vcf -selectType SNP

    java -jar GenomeAnalysisTK.jar -T SelectVariants -R IsaacIndex/genome.fa -V data_raw.vcf -o raw_indel.vcf -selectType INDEL

    Then VQSR as I mentioned in the first post.

  • Kelly135Kelly135 KoreaMember

    I just found that processing whole exome data requires restricting target regions, which I didn't. So I believe it could be the cause of high false-positives. I will re-analyze the data again. Still, if you have any other opinion that I should refer to, please let me know.
    Thanks for your help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    When you have your new callset (with intervals restricted) you should change your VQSR workflow a bit to avoid separating out SNPs and indels. See the workshop VQSR presentation slide decks for a detailed explanation of how you should set it up.

Sign In or Register to comment.