VCF file size is not reduced after running 'ApplyRecalibration'

rcholicrcholic DenverPosts: 68Member
edited October 2013 in Ask the GATK team

I was expecting the "ApplyRecalibration' to reduce the VCF files output by Haplotypecaller. Below is my command line for VariantRecalibrator and ApplyRecalibration. I was wondering if I did anything wrong or the VCF file size does not always get smaller? or any suggestions to improve my commandlines?

java -Xmx4g  -Djava.io.tmpdir=/Volumes/tempdata1/tonywang/GATK_temp -jar $CLASSPATH/GenomeAnalysisTK.jar \
-T VariantRecalibrator \
-R GATK_ref/hg19.fasta \
--input ../GATK/raw_variants_snps_indels-3.vcf \
-nt 6 \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 GATK_ref/hapmap_3.3.hg19.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 GATK_ref/1000G_omni2.5.hg19.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 GATK_ref/1000G_phase1.snps.high_confidence.hg19.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 GATK_ref/dbsnp_137.hg19.vcf \
-an QD -an MQRankSum -an ReadPosRankSum -an FS -an DP \
--maxGaussians 4 \
--numBadVariants 2000 \
-mode SNP \
-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-log ../GATK/VQSR/log/raw_variants_snps-3_snps_recal.log \
-recalFile ../GATK/VQSR/SNPs/snps-3_snp.recal.vcf \
-tranchesFile ../GATK/VQSR/SNPs/snps-3_snp.tranches \
-rscriptFile ../GATK/VQSR/SNPs/snps-3_snp_recal.plots.R




java -Xmx6g -Djava.awt.headless=true -jar $CLASSPATH/GenomeAnalysisTK.jar \
-T ApplyRecalibration \
-R GATK_ref/hg19.fasta \
-nt 5 \
--input ../GATK/raw_variants_snps_indels.vcf \
-mode SNP \
--ts_filter_level 99.0 \
-recalFile ../GATK/VQSR/SNPs/snps-3_snp.recal.vcf \
-tranchesFile ../GATK/VQSR/SNPs/snps-3_snp.tranches \
-log ../GATK/VQSR/SNPs/filtered/snps-3_snp.recal_filtered.log
-o ../GATK/VQSR/SNPs/filtered/snps-3_snp.recal_filtered.vcf
Post edited by rcholic on

Best Answer

Answers

Sign In or Register to comment.