Should I provide the exome target list (-L argu) even while calling gVCF file using Haplotypecaller?

NandaNanda CanadaMember
edited April 2017 in Ask the GATK team

Hi,

Recently we performed exome sequencing using Nextera Illumina platform for three samples (Father, Mother and Son). I downloaded the exome interval list from Illumina's website.

1) Trimmed the raw reads
2) Aligned the trimmed reads against the human reference hg19 as recommended for exome-sequencing
3) Then sorted, deduped, recalibrated the bam file.
4) Then performed variant calling in two steps process for all three samples individually
4.1) Used the GATK Haplotype Caller tool in GVCF mode
Command: java -Xmx16g -jar GenomeAnalysisTK.jar - T Haplotypecaller -R /GATK_bundle/hg19.fa -I sample1.sorted.dedup.recal.bam --emitRefConfidence GVCF --dbsnp /GATK_bundle/dbsnp.138.hg19.vcf -o sample1.raw.g.vcf
4.2) Used GenotypeGVCFs (Joint SNP calling) for all three samples together
Command: java -Xmx16g -jar GenomeAnalysisTK.jar - T GenotypeGVCFs -R /GATK_bundle/hg19.fa --variant sample1.raw.g.vcf --variant sample2.raw.g.vcf --variant sample3.raw.g.vcf --dbsnp /GATK_bundle/dbsnp.138.hg19.vcf -o sample1.2.3.trio.raw.vcf

In the above command, I didn't use the Illumina's exome interval list used for targeting the exomes in sequencing process.

As per this link "https://software.broadinstitute.org/gatk/documentation/article.php?id=4669", under the example section of GATK command lines, for exome sequencing the article suggests us to provide the exome targets using -L argument.

I have following queries,as per the aforementioned article
1) Should I provide the exome target list (-L argument) only while calling regular VCF file using Haplotype caller?
or
2) Should I provide the exome target list (-L argument) even while calling gVCF file using Haplotype caller?

Best Answer

Answers

  • NandaNanda CanadaMember
    edited April 2017

    Thanks Geraldine VdAuwera. Sure, I will subset the vcf records which I got from GenotypeGVCFs tool.

    I have another query regarding this post gatkforums.broadinstitute.org/gatk/discussion/9396/i-have-50-exome-samples-belong-to-25-families-do-i-run-genotypevcfs-on-familywise-or-50-together/p1?new=1, these 50 exome samples were sequenced on different batches. Some of the batches are even 2 years old. For all the 50 exome samples, can I use the latest exome interval list from illumina or should use exome interval list that has been used for sequencing?

    Issue · Github
    by Sheila

    Issue Number
    1966
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Nanda
    Hi,

    You should use the intersection of the interval lists because the union may decrease statistical power. For example, if you only have information for some samples and not others at a site, you cannot be confident how much importance that site really has.

    We recommend using the intersection because you will have data for all samples in your final VCF. When you use the union, if you have information for only some samples that show importance, you will have to go back and re-sequence the samples with missing information. Ultimately, it is up to you to determine which is better for your purposes.

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember
    edited April 2017

    @Geraldine_VdAuwera said:
    You should always use the target list argument when running HaplotypeCaller on exomes or any other targeted sequence type because it will be more efficient computationally.

    Interesting, I thought I read guides on GVCF properly and hence for my exome-capture data I did not use -L to select ranges to be acted upon by HaplotypeCaller when yielding per-sample GVCF. The guide seemed the statistics is robust enough and that it is the aim of the approach to keep track of even low-coverage regions. My impression was that my low-coverage regions are clearly regions with mis-mapped reads but I did not object. My bad, now I burned my hands (read further below).

    We even encourage you to use a target list when running on whole genomes (and we provide genome target lists) because this speeds up processing by skipping regions we know from experience are pretty much impossible to call.

    Please refer to concrete filenames inside your GATK bundles.

    That being said, if you have already run everything without a target list, don't panic -- you can use the results; just be sure to subset your final VCF to the target list. However, don't subset calls at the GVCF level, because that could cause technical problems. Run GenotypeGVCFs first to produce the VCF, then subset the calls.

    It is too late because I merged 35 patient GVCFs together into the multi-sample GVCF. Here you can see the bad outcome from VQSR. The PDF files contain 21 pages despite my 35 samples. The pages have no Title so I have no clue what they are about and why there are just 21 instead of 35. The Rscripts run as a part of VQSR should be more thorough in including Titles for each drawing (https://software.broadinstitute.org/gatk/documentation/article.php?id=39). It seems the VCF lines with merely zero counts in most samples ruined the statistics:

    chr1    868860  .       T       C       19.87   PASS    AC=2;AF=0.250;AN=8;DP=6;ExcessHet=0.3218;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=44.10;QD=9.93;SOR=0.693;VQSLOD=6.13;culprit=FS GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0       0/0:2,0:2:6:0,6,48      ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       0/0:1,0:1:3:0,3,26      ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       0/0:1,0:1:3:0,3,20      ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0
           1/1:0,2:2:6:49,6,0      ./.:0,0:0:.:0,0,0       ./.:0,0:0:.:0,0,0
    

    Now I will go back and add hard filters to HaplotypeCaller based on coverage and/or use -L to specify ranges used for target enrichment (Agilent human exome v5, which would be a bit narrower than those ranges actually well-covered).

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mmokrejs
    Hi,

    You can still subset the final VCF using -L. Have you tried that? Do the results of VQSR look better?

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember
    edited April 2017

    Yes, the processing just finished but I do not have really better results.

    Here I generated one of the GVCF files for a single sample.

    ##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Thu Apr 20 20:07:49 CEST 2017",Epoch=1492711669348,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[mysample-42_42-PB.bwa.postalt.removed_duplicates.realignedtogether.BQSR.namesorted.fixmate.calmd.sorted.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[S04380110__Agilent_SureSelect_Human_All_Exon_V5/hg38/S04380110_Covered.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=100 reference_sequence=ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name=dbsnp source=ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
    

    Here I merged the 35 exomes into a single GVCF, notably I even used now --useNewAFCalculator.

    GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 16 -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --dbsnp ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz --useNewAFCalculator -o mysamples.raw.vcf --variant $sample1.g.vcf --variant $sample2.g.vcf ...
    
    INFO  11:21:26,420 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  11:21:28,104 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    WARN  11:21:36,056 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation 
    INFO  11:21:36,064 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 1 CPU thread(s) for each of 16 data thread(s), of 16 processors available on this machine 
    INFO  11:21:37,614 GenomeAnalysisEngine - Preparing for traversal 
    INFO  11:21:37,621 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  11:21:37,622 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  11:21:37,622 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  11:21:37,622 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    WARN  11:21:38,105 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  11:21:38,106 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    INFO  11:21:38,106 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files 
    WARN  11:21:40,969 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs 
    INFO  11:22:07,632 ProgressMeter -   chr1:15494039    189106.0    30.0 s       2.6 m        0.5%   103.8 m     103.3 m 
    INFO  11:22:37,682 ProgressMeter -   chr1:34197349   2357356.0    60.0 s      25.0 s        1.1%    94.1 m      93.1 m 
    INFO  11:23:07,696 ProgressMeter -  chr1:125169834   9959165.0    90.0 s       9.0 s        3.9%    38.6 m      37.1 m 
    INFO  11:23:37,697 ProgressMeter -  chr1:210999841   1.6488893E7   120.0 s       7.0 s        6.6%    30.5 m      28.5 m 
    INFO  11:24:07,714 ProgressMeter -   chr2:73002681   2.539001E7     2.5 m       5.0 s       10.0%    25.0 m      22.5 m 
    INFO  11:24:37,715 ProgressMeter -  chr2:176999313   3.3869221E7     3.0 m       5.0 s       13.2%    22.7 m      19.7 m 
    INFO  11:25:07,716 ProgressMeter -   chr3:46266160   4.2509927E7     3.5 m       4.0 s       16.7%    21.0 m      17.5 m 
    INFO  11:25:37,719 ProgressMeter -  chr3:147298951   5.0672524E7     4.0 m       4.0 s       19.8%    20.2 m      16.2 m 
    INFO  11:26:07,721 ProgressMeter -   chr4:30130257   5.7461949E7     4.5 m       4.0 s       22.4%    20.1 m      15.6 m 
    INFO  11:26:37,722 ProgressMeter -  chr4:149066900   6.7334954E7     5.0 m       4.0 s       26.1%    19.2 m      14.2 m 
    INFO  11:27:07,723 ProgressMeter -  chr5:104999922   7.6957794E7     5.5 m       4.0 s       30.6%    18.0 m      12.5 m 
    INFO  11:27:37,758 ProgressMeter -   chr6:17998488   8.531772E7     6.0 m       4.0 s       33.5%    17.9 m      11.9 m 
    INFO  11:28:07,759 ProgressMeter -  chr6:123999834   9.4004119E7     6.5 m       4.0 s       36.8%    17.6 m      11.1 m 
    INFO  11:28:37,761 ProgressMeter -   chr7:51998993   1.01637556E8     7.0 m       4.0 s       39.9%    17.5 m      10.5 m 
    INFO  11:29:07,762 ProgressMeter -    chr8:7055327   1.0985578E8     7.5 m       4.0 s       43.5%    17.3 m       9.8 m 
    INFO  11:29:37,764 ProgressMeter -   chr8:93131611   1.18074387E8     8.0 m       4.0 s       46.1%    17.3 m       9.3 m 
    INFO  11:30:07,766 ProgressMeter -  chr9:115021154   1.2797869E8     8.5 m       3.0 s       51.3%    16.6 m       8.1 m 
    INFO  11:30:37,768 ProgressMeter -  chr10:19348104   1.32098422E8     9.0 m       4.0 s       52.7%    17.1 m       8.1 m 
    INFO  11:31:07,770 ProgressMeter - chr10:121485710   1.4054994E8     9.5 m       4.0 s       55.8%    17.0 m       7.5 m 
    INFO  11:31:37,771 ProgressMeter -  chr11:69999116   1.48224067E8    10.0 m       4.0 s       58.4%    17.1 m       7.1 m 
    INFO  11:32:07,772 ProgressMeter -  chr12:15899984   1.56204694E8    10.5 m       4.0 s       60.9%    17.2 m       6.7 m 
    INFO  11:32:37,774 ProgressMeter - chr12:119037920   1.64485809E8    11.0 m       4.0 s       64.1%    17.2 m       6.2 m 
    INFO  11:33:07,775 ProgressMeter -  chr13:86999031   1.72326288E8    11.5 m       4.0 s       67.3%    17.1 m       5.6 m 
    INFO  11:33:37,777 ProgressMeter -  chr14:96998790   1.80636815E8    12.0 m       3.0 s       71.1%    16.9 m       4.9 m 
    INFO  11:34:07,778 ProgressMeter -  chr15:90084756   1.87839344E8    12.5 m       3.0 s       74.2%    16.8 m       4.3 m 
    INFO  11:34:37,779 ProgressMeter -  chr16:76999816   1.95111601E8    13.0 m       3.0 s       77.0%    16.9 m       3.9 m 
    INFO  11:35:07,784 ProgressMeter -  chr17:48051036   2.01614335E8    13.5 m       4.0 s       78.9%    17.1 m       3.6 m 
    INFO  11:35:37,787 ProgressMeter -  chr18:54280883   2.09652692E8    14.0 m       4.0 s       81.7%    17.1 m       3.1 m 
    INFO  11:36:07,788 ProgressMeter -  chr19:24896988   2.15700809E8    14.5 m       4.0 s       83.3%    17.4 m       2.9 m 
    INFO  11:36:37,790 ProgressMeter -  chr20:21126038   2.21494393E8    15.0 m       4.0 s       85.0%    17.7 m       2.7 m 
    INFO  11:37:07,792 ProgressMeter -  chr22:26292535   2.29245708E8    15.5 m       4.0 s       88.6%    17.5 m     119.0 s 
    INFO  11:37:37,795 ProgressMeter -   chrX:65486906   2.36493162E8    16.0 m       4.0 s       91.4%    17.5 m      90.0 s 
    INFO  11:38:07,796 ProgressMeter -    chrY:8880673   2.4253789E8    16.5 m       4.0 s       94.5%    17.5 m      57.0 s 
    INFO  11:38:37,797 ProgressMeter - chr19_KI270938v1_alt:275871   2.43173867E8    17.0 m       4.0 s       99.7%    17.0 m       2.0 s 
    INFO  11:39:07,799 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    17.5 m       4.0 s       99.7%    17.5 m       2.0 s 
    INFO  11:39:37,801 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    18.0 m       4.0 s       99.7%    18.0 m       2.0 s 
    INFO  11:40:07,803 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    18.5 m       4.0 s       99.7%    18.6 m       3.0 s 
    INFO  11:40:37,804 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    19.0 m       4.0 s       99.7%    19.1 m       3.0 s 
    INFO  11:41:07,806 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    19.5 m       4.0 s       99.7%    19.6 m       3.0 s 
    INFO  11:41:37,808 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    20.0 m       4.0 s       99.7%    20.1 m       3.0 s 
    INFO  11:42:07,810 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    20.5 m       5.0 s       99.7%    20.6 m       3.0 s 
    INFO  11:42:37,812 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    21.0 m       5.0 s       99.7%    21.1 m       3.0 s 
    INFO  11:43:07,813 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    21.5 m       5.0 s       99.7%    21.6 m       3.0 s 
    INFO  11:43:37,814 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    22.0 m       5.0 s       99.7%    22.1 m       3.0 s 
    INFO  11:44:07,815 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    22.5 m       5.0 s       99.7%    22.6 m       3.0 s 
    INFO  11:44:37,816 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    23.0 m       5.0 s       99.7%    23.1 m       3.0 s 
    INFO  11:45:07,817 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    23.5 m       5.0 s       99.7%    23.6 m       3.0 s 
    INFO  11:45:37,826 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    24.0 m       5.0 s       99.7%    24.1 m       3.0 s 
    INFO  11:46:07,828 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    24.5 m       6.0 s       99.7%    24.6 m       4.0 s 
    INFO  11:46:37,830 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    25.0 m       6.0 s       99.7%    25.1 m       4.0 s 
    INFO  11:47:07,831 ProgressMeter - chr19_KI270938v1_alt:275871   2.43314032E8    25.5 m       6.0 s       99.7%    25.6 m       4.0 s 
    INFO  11:47:09,294 ProgressMeter -            done   2.43460127E8    25.5 m       6.0 s       99.7%    25.6 m       4.0 s 
    INFO  11:47:09,294 ProgressMeter - Total runtime 1531.67 secs, 25.53 min, 0.43 hours 
    ------------------------------------------------------------------------------------------
    Done. There were 4 WARN messages, the first 4 are repeated below.
    WARN  11:21:36,056 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:21:38,105 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  11:21:38,106 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. 
    WARN  11:21:40,969 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs 
    ------------------------------------------------------------------------------------------
    

    Here I created the model for VQSR.

    GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar -T VariantRecalibrator -nt 16 -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input mysamples.raw.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ftp.broadinstitute.org/bundle/hg38/hapmap_3.3.hg38.vcf.gz -resource:omni,known=false,training=true,truth=false,prior=12.0 ftp.broadinstitute.org/bundle/hg38/1000G_omni2.5.hg38.vcf.gz -resource:1000G,known=false,training=true,truth=false,prior=10.0 ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/GATK/00-All.vcf.gz -an QD -an FS -an SOR -an MQ -an MQRankSum -an ReadPosRankSum -MQCap 70 -an InbreedingCoeff --target_titv 3.2 -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
    
    INFO  11:47:14,381 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  11:47:15,819 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    WARN  11:47:17,288 IndexDictionaryUtils - Track hapmap doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,288 IndexDictionaryUtils - Track omni doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,289 IndexDictionaryUtils - Track 1000G doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,289 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation 
    INFO  11:47:17,297 MicroScheduler - Running the GATK in parallel mode with 16 total threads, 1 CPU thread(s) for each of 16 data thread(s), of 16 processors available on this machine 
    INFO  11:47:18,733 GenomeAnalysisEngine - Preparing for traversal 
    INFO  11:47:18,741 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  11:47:18,741 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  11:47:18,741 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  11:47:18,742 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    INFO  11:47:18,749 TrainingSet - Found hapmap track:    Known = false   Training = true         Truth = true    Prior = Q15.0 
    INFO  11:47:18,749 TrainingSet - Found omni track:      Known = false   Training = true         Truth = false   Prior = Q12.0 
    INFO  11:47:18,749 TrainingSet - Found 1000G track:     Known = false   Training = true         Truth = false   Prior = Q10.0 
    INFO  11:47:18,750 TrainingSet - Found dbsnp track:     Known = true    Training = false        Truth = false   Prior = Q2.0 
    INFO  11:47:48,745 ProgressMeter -   chr1:80998348   4753246.0    30.0 s       6.0 s        2.5%    19.9 m      19.4 m 
    INFO  11:48:18,746 ProgressMeter -   chr2:17998995   1.404037E7    60.0 s       4.0 s        8.3%    12.1 m      11.1 m 
    INFO  11:48:48,747 ProgressMeter -  chr2:185998414   2.3584505E7    90.0 s       3.0 s       13.5%    11.1 m       9.6 m 
    INFO  11:49:18,748 ProgressMeter -  chr3:108999457   3.3217497E7   120.0 s       3.0 s       18.7%    10.7 m       8.7 m 
    INFO  11:49:48,750 ProgressMeter -   chr4:88998852   4.3616064E7     2.5 m       3.0 s       24.2%    10.3 m       7.8 m 
    INFO  11:50:18,751 ProgressMeter -   chr5:66999691   5.3086592E7     3.0 m       3.0 s       29.4%    10.2 m       7.2 m 
    INFO  11:50:48,752 ProgressMeter -   chr6:58036395   6.2324218E7     3.5 m       3.0 s       34.8%    10.1 m       6.6 m 
    INFO  11:51:18,753 ProgressMeter -   chr7:55394645   7.2100905E7     4.0 m       3.0 s       40.0%    10.0 m       6.0 m 
    INFO  11:51:48,757 ProgressMeter -   chr8:72999280   8.3432657E7     4.5 m       3.0 s       45.5%     9.9 m       5.4 m 
    INFO  11:52:18,758 ProgressMeter -  chr9:108999126   9.279247E7     5.0 m       3.0 s       51.1%     9.8 m       4.8 m 
    INFO  11:52:48,760 ProgressMeter -   chr11:1999326   1.025952E8     5.5 m       3.0 s       56.3%     9.8 m       4.3 m 
    INFO  11:53:18,761 ProgressMeter -  chr12:43998134   1.13018674E8     6.0 m       3.0 s       61.8%     9.7 m       3.7 m 
    INFO  11:53:48,762 ProgressMeter -  chr13:78998546   1.2172114E8     6.5 m       3.0 s       67.0%     9.7 m       3.2 m 
    INFO  11:54:18,764 ProgressMeter -  chr15:67227481   1.30914593E8     7.0 m       3.0 s       73.5%     9.5 m       2.5 m 
    INFO  11:54:48,765 ProgressMeter -  chr17:36999795   1.41449746E8     7.5 m       3.0 s       78.6%     9.5 m       2.0 m 
    INFO  11:55:18,767 ProgressMeter -  chr19:53999801   1.52365069E8     8.0 m       3.0 s       84.2%     9.5 m      90.0 s 
    INFO  11:55:48,769 ProgressMeter -   chrX:15998059   1.61757085E8     8.5 m       3.0 s       89.9%     9.5 m      57.0 s 
    INFO  11:56:18,770 ProgressMeter -   chrY:26672530   1.67465903E8     9.0 m       3.0 s       95.0%     9.5 m      28.0 s 
    INFO  11:56:48,772 ProgressMeter - chr2_GL383522v1_alt:35618   1.67490576E8     9.5 m       3.0 s       96.5%     9.8 m      20.0 s 
    INFO  11:57:18,773 ProgressMeter - chr17_KI270909v1_alt:224256   1.67493678E8    10.0 m       3.0 s       98.5%    10.1 m       8.0 s 
    INFO  11:57:48,774 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    10.5 m       3.0 s       99.7%    10.5 m       1.0 s 
    INFO  11:58:18,776 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    11.0 m       3.0 s       99.7%    11.0 m       1.0 s 
    INFO  11:58:48,777 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    11.5 m       4.0 s       99.7%    11.5 m       1.0 s 
    INFO  11:59:18,778 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    12.0 m       4.0 s       99.7%    12.0 m       2.0 s 
    INFO  11:59:48,782 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    12.5 m       4.0 s       99.7%    12.5 m       2.0 s 
    INFO  12:00:18,783 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    13.0 m       4.0 s       99.7%    13.0 m       2.0 s 
    INFO  12:00:48,785 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    13.5 m       4.0 s       99.7%    13.5 m       2.0 s 
    INFO  12:01:18,786 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    14.0 m       5.0 s       99.7%    14.0 m       2.0 s 
    INFO  12:01:48,787 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    14.5 m       5.0 s       99.7%    14.5 m       2.0 s 
    INFO  12:02:18,788 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    15.0 m       5.0 s       99.7%    15.0 m       2.0 s 
    INFO  12:02:48,789 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    15.5 m       5.0 s       99.7%    15.5 m       2.0 s 
    INFO  12:03:18,790 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    16.0 m       5.0 s       99.7%    16.0 m       2.0 s 
    INFO  12:03:48,791 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    16.5 m       5.0 s       99.7%    16.5 m       2.0 s 
    INFO  12:04:15,941 VariantDataManager - QD:      mean = 16.98    standard deviation = 7.37 
    INFO  12:04:16,010 VariantDataManager - FS:      mean = 1.63     standard deviation = 3.58 
    INFO  12:04:16,057 VariantDataManager - SOR:     mean = 1.14     standard deviation = 1.18 
    INFO  12:04:16,120 VariantDataManager - MQ:      mean = 1.68     standard deviation = 0.36 
    INFO  12:04:16,170 VariantDataManager - MQRankSum:       mean = -0.04    standard deviation = 0.48 
    INFO  12:04:16,221 VariantDataManager - ReadPosRankSum:          mean = 0.12     standard deviation = 0.68 
    INFO  12:04:16,272 VariantDataManager - InbreedingCoeff:         mean = 0.01     standard deviation = 0.20 
    INFO  12:04:16,614 VariantDataManager - Annotations are now ordered by their information content: [QD, MQ, SOR, FS, InbreedingCoeff, ReadPosRankSum, MQRankSum] 
    INFO  12:04:16,641 VariantDataManager - Training with 174384 variants after standard deviation thresholding. 
    INFO  12:04:16,645 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  12:04:18,792 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    17.0 m       6.0 s       99.7%    17.0 m       2.0 s 
    INFO  12:04:27,914 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  12:04:33,306 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.66335 
    INFO  12:04:38,774 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.25132 
    INFO  12:04:44,515 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.10154 
    INFO  12:04:48,793 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    17.5 m       6.0 s       99.7%    17.5 m       2.0 s 
    INFO  12:04:50,358 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.02245 
    INFO  12:04:56,186 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.04004 
    INFO  12:05:02,001 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.06854 
    INFO  12:05:07,826 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.05058 
    INFO  12:05:13,634 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.01001 
    INFO  12:05:18,794 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    18.0 m       6.0 s       99.7%    18.1 m       3.0 s 
    INFO  12:05:19,437 VariantRecalibratorEngine - Finished iteration 45.   Current change in mixture coefficients = 0.00519 
    INFO  12:05:25,235 VariantRecalibratorEngine - Finished iteration 50.   Current change in mixture coefficients = 0.00176 
    INFO  12:05:25,236 VariantRecalibratorEngine - Convergence after 50 iterations! 
    INFO  12:05:26,006 VariantRecalibratorEngine - Evaluating full set of 516353 variants... 
    INFO  12:05:32,847 VariantDataManager - Training with worst 17878 scoring variants --> variants with LOD <= -5.0000. 
    INFO  12:05:32,848 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
    INFO  12:05:33,031 VariantRecalibratorEngine - Finished iteration 0. 
    INFO  12:05:33,113 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.00753 
    INFO  12:05:33,192 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.01562 
    INFO  12:05:33,270 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.01283 
    INFO  12:05:33,349 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.01727 
    INFO  12:05:33,428 VariantRecalibratorEngine - Finished iteration 25.   Current change in mixture coefficients = 0.01716 
    INFO  12:05:33,507 VariantRecalibratorEngine - Finished iteration 30.   Current change in mixture coefficients = 0.01228 
    INFO  12:05:33,588 VariantRecalibratorEngine - Finished iteration 35.   Current change in mixture coefficients = 0.00599 
    INFO  12:05:33,669 VariantRecalibratorEngine - Finished iteration 40.   Current change in mixture coefficients = 0.00223 
    INFO  12:05:33,686 VariantRecalibratorEngine - Convergence after 41 iterations! 
    INFO  12:05:33,714 VariantRecalibratorEngine - Evaluating full set of 516353 variants... 
    INFO  12:05:41,145 TrancheManager - Finding 4 tranches for 516353 variants 
    INFO  12:05:41,460 TrancheManager -   Tranche threshold 100.00 => selection metric threshold 0.000 
    INFO  12:05:41,533 TrancheManager -   Found tranche for 100.000: 0.000 threshold starting with variant 0; running score is 0.000  
    INFO  12:05:41,533 TrancheManager -   Tranche is Tranche ts=100.00 minVQSLod=-39768.4476 known=(283569 @ 1.1998) novel=(232784 @ 0.0791) truthSites(85270 accessible, 85270 called), name=anonymous] 
    INFO  12:05:41,534 TrancheManager -   Tranche threshold 99.90 => selection metric threshold 0.001 
    INFO  12:05:41,605 TrancheManager -   Found tranche for 99.900: 0.001 threshold starting with variant 9680; running score is 0.001  
    INFO  12:05:41,605 TrancheManager -   Tranche is Tranche ts=99.90 minVQSLod=-2.1611 known=(276686 @ 1.2012) novel=(229987 @ 0.0783) truthSites(85270 accessible, 85184 called), name=anonymous] 
    INFO  12:05:41,606 TrancheManager -   Tranche threshold 99.00 => selection metric threshold 0.010 
    INFO  12:05:41,654 TrancheManager -   Found tranche for 99.000: 0.010 threshold starting with variant 26291; running score is 0.010  
    INFO  12:05:41,655 TrancheManager -   Tranche is Tranche ts=99.00 minVQSLod=0.5630 known=(266150 @ 1.1971) novel=(223912 @ 0.0777) truthSites(85270 accessible, 84417 called), name=anonymous] 
    INFO  12:05:41,655 TrancheManager -   Tranche threshold 90.00 => selection metric threshold 0.100 
    INFO  12:05:41,705 TrancheManager -   Found tranche for 90.000: 0.100 threshold starting with variant 108351; running score is 0.100  
    INFO  12:05:41,706 TrancheManager -   Tranche is Tranche ts=90.00 minVQSLod=7.1301 known=(226459 @ 1.2340) novel=(181543 @ 0.0775) truthSites(85270 accessible, 76743 called), name=anonymous] 
    INFO  12:05:41,709 VariantRecalibrator - Writing out recalibration table... 
    INFO  12:05:45,516 VariantRecalibrator - Writing out visualization Rscript file... 
    INFO  12:05:45,580 VariantRecalibrator - Building QD x MQ plot... 
    INFO  12:05:45,585 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:46,941 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:47,363 VariantRecalibrator - Building QD x SOR plot... 
    INFO  12:05:47,366 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:48,719 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:48,795 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    18.5 m       6.0 s       99.7%    18.6 m       3.0 s 
    INFO  12:05:49,102 VariantRecalibrator - Building QD x FS plot... 
    INFO  12:05:49,103 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:50,466 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:50,829 VariantRecalibrator - Building QD x InbreedingCoeff plot... 
    INFO  12:05:50,830 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:05:52,164 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:05:52,535 VariantRecalibrator - Building QD x ReadPosRankSum plot... 
    INFO  12:05:52,536 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:53,902 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:05:54,254 VariantRecalibrator - Building QD x MQRankSum plot... 
    INFO  12:05:54,255 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:05:55,553 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:05:55,906 VariantRecalibrator - Building MQ x SOR plot... 
    INFO  12:05:55,907 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:05:57,285 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:05:57,637 VariantRecalibrator - Building MQ x FS plot... 
    INFO  12:05:57,638 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:05:59,022 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:05:59,380 VariantRecalibrator - Building MQ x InbreedingCoeff plot... 
    INFO  12:05:59,381 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:00,739 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:01,103 VariantRecalibrator - Building MQ x ReadPosRankSum plot... 
    INFO  12:06:01,103 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:02,489 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:02,846 VariantRecalibrator - Building MQ x MQRankSum plot... 
    INFO  12:06:02,847 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:04,165 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:04,518 VariantRecalibrator - Building SOR x FS plot... 
    INFO  12:06:04,519 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:05,906 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:06,259 VariantRecalibrator - Building SOR x InbreedingCoeff plot... 
    INFO  12:06:06,260 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:07,619 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:07,984 VariantRecalibrator - Building SOR x ReadPosRankSum plot... 
    INFO  12:06:07,985 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:09,367 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:09,724 VariantRecalibrator - Building SOR x MQRankSum plot... 
    INFO  12:06:09,725 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:11,043 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:11,388 VariantRecalibrator - Building FS x InbreedingCoeff plot... 
    INFO  12:06:11,388 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:12,747 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:13,112 VariantRecalibrator - Building FS x ReadPosRankSum plot... 
    INFO  12:06:13,113 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:14,498 VariantRecalibratorEngine - Evaluating full set of 3721 variants... 
    INFO  12:06:14,866 VariantRecalibrator - Building FS x MQRankSum plot... 
    INFO  12:06:14,867 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:16,183 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:16,529 VariantRecalibrator - Building InbreedingCoeff x ReadPosRankSum plot... 
    INFO  12:06:16,530 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:17,893 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:18,256 VariantRecalibrator - Building InbreedingCoeff x MQRankSum plot... 
    INFO  12:06:18,256 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:06:18,796 ProgressMeter - chr19_KI270938v1_alt:195720   1.67493681E8    19.0 m       6.0 s       99.7%    19.1 m       3.0 s 
    INFO  12:06:19,554 VariantRecalibratorEngine - Evaluating full set of 3600 variants... 
    INFO  12:06:19,912 VariantRecalibrator - Building ReadPosRankSum x MQRankSum plot... 
    INFO  12:06:19,912 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:21,235 VariantRecalibratorEngine - Evaluating full set of 3660 variants... 
    INFO  12:06:21,586 VariantRecalibrator - Executing: Rscript recalibrate_SNP_plots.R 
    INFO  12:06:46,699 VariantRecalibrator - Executing: Rscript (resource)org/broadinstitute/gatk/tools/walkers/variantrecalibration/plot_Tranches.R recalibrate_SNP.tranches 3.2 
    INFO  12:06:47,120 ProgressMeter -            done   1.67493763E8    19.5 m       6.0 s       99.7%    19.5 m       3.0 s 
    INFO  12:06:47,121 ProgressMeter - Total runtime 1168.38 secs, 19.47 min, 0.32 hours 
    ------------------------------------------------------------------------------------------
    Done. There were 4 WARN messages, the first 4 are repeated below.
    WARN  11:47:17,288 IndexDictionaryUtils - Track hapmap doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,288 IndexDictionaryUtils - Track omni doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,289 IndexDictionaryUtils - Track 1000G doesn't have a sequence dictionary built in, skipping dictionary validation 
    WARN  11:47:17,289 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation 
    ------------------------------------------------------------------------------------------
    

    Here I applied the model.

    GenomeAnalysisTK.jar -T ApplyRecalibration -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input mysamples.raw.vcf -mode SNP --ts_filter_level 99.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -o mysamples.recalibrated_snps_raw_indels.vcf
    INFO  12:06:50,949 HelpFormatter - -------------------------------------------------------------------------------------------- 
    INFO  12:06:50,951 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 
    INFO  12:06:50,951 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute 
    INFO  12:06:50,951 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk 
    INFO  12:06:50,951 HelpFormatter - [Wed Apr 26 12:06:50 CEST 2017] Executing on Linux 2.6.32-642.15.1.el6.Bull.110.x86_64 amd64 
    INFO  12:06:50,952 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13 
    INFO  12:06:50,955 HelpFormatter - Program Args: -T ApplyRecalibration -R ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/hs38DH.fa --input mysamples.raw.vcf -mode SNP --ts_filter_level 99.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -o recalibrated_snps_raw_indels.vcf 
    INFO  12:06:50,959 HelpFormatter - Executing as mmokrejs@cn81 on Linux 2.6.32-642.15.1.el6.Bull.110.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13. 
    INFO  12:06:50,959 HelpFormatter - Date/Time: 2017/04/26 12:06:50 
    INFO  12:06:50,959 HelpFormatter - -------------------------------------------------------------------------------------------- 
    INFO  12:06:50,959 HelpFormatter - -------------------------------------------------------------------------------------------- 
    INFO  12:06:50,978 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  12:06:52,408 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  12:06:54,472 GenomeAnalysisEngine - Preparing for traversal 
    INFO  12:06:54,479 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  12:06:54,479 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  12:06:54,480 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  12:06:54,480 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    INFO  12:06:54,484 ApplyRecalibration - Read tranche Tranche ts=90.00 minVQSLod=7.1301 known=(226459 @ 1.2340) novel=(181543 @ 0.0775) truthSites(85270 accessible, 76743 called), name=VQSRTrancheSNP0.00to90.00] 
    INFO  12:06:54,485 ApplyRecalibration - Read tranche Tranche ts=99.00 minVQSLod=0.5630 known=(266150 @ 1.1971) novel=(223912 @ 0.0777) truthSites(85270 accessible, 84417 called), name=VQSRTrancheSNP90.00to99.00] 
    INFO  12:06:54,485 ApplyRecalibration - Read tranche Tranche ts=99.90 minVQSLod=-2.1611 known=(276686 @ 1.2012) novel=(229987 @ 0.0783) truthSites(85270 accessible, 85184 called), name=VQSRTrancheSNP99.00to99.90] 
    INFO  12:06:54,486 ApplyRecalibration - Read tranche Tranche ts=100.00 minVQSLod=-39768.4476 known=(283569 @ 1.1998) novel=(232784 @ 0.0791) truthSites(85270 accessible, 85270 called), name=VQSRTrancheSNP99.90to100.00] 
    INFO  12:06:54,519 ApplyRecalibration - Keeping all variants in tranche Tranche ts=99.00 minVQSLod=0.5630 known=(266150 @ 1.1971) novel=(223912 @ 0.0777) truthSites(85270 accessible, 84417 called), name=VQSRTrancheSNP90.00to99.00] 
    INFO  12:07:24,483 ProgressMeter -    chrX:2781927    596978.0    30.0 s      50.0 s       89.4%    33.0 s       3.0 s 
    INFO  12:07:26,138 ProgressMeter -            done    615613.0    31.0 s      51.0 s       99.7%    31.0 s       0.0 s 
    INFO  12:07:26,139 ProgressMeter - Total runtime 31.66 secs, 0.53 min, 0.01 hours 
    ------------------------------------------------------------------------------------------
    Done. There were no warn messages.
    ------------------------------------------------------------------------------------------
    

    I am not showing the INDEL VQSR step but I assume it is similar to the above.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mmokrejs
    Hi,

    I have one more thing to try. Can you set -resource:omni,known=false,training=true,truth=true,prior=12.0 instead of -resource:omni,known=false,training=true,truth=false,prior=12.0?

    Thanks,
    Sheila

  • mmokrejsmmokrejs Czech RepublicMember

    @Sheila
    That did not help either.

    Can I use https://www.illumina.com/platinumgenomes.html , namely ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf and ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12878/NA12878.vcf ? From reading VQSR docs I am not certain what annotations are necessary in the VCF file for VariantRecalibrator. Just the sites?

    Thank you.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mmokrejs
    Hi,

    I have one other thing for you to try. I will also ask my team for other suggestions. In this thread the user has the opposite problem, but her issue was solved by setting the TiTv ratio in her ApplyRecalibration command. Perhaps you can try setting --target_titv 3.

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember

    Hi @Sheila

    her issue was solved by setting the TiTv ratio in her ApplyRecalibration command. Perhaps you can try setting --target_titv 3

    No, I do not see any large differenc ein the PDF plots after the change from 3.2 to 3.0.

    Issue · Github
    by Sheila

    Issue Number
    2040
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @mmokrejs
    Hi,

    I am checking with the team. We will get back to you soon.

    -Sheila

  • mmokrejsmmokrejs Czech RepublicMember

    BTW from a quick reading the supplementary material of http://science.sciencemag.org/content/early/2016/03/02/science.aac8624 speaks on page 5 about Ti/Tv ratio about 2.5 in same Agilent Human Exon v5 data.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited May 2017

    @mmokrejs
    Hi,

    Does setting the TiTv ratio to 2.5 help with the plots?

    -Sheila

  • @Nanda said:
    Hi,

    Recently we performed exome sequencing using Nextera Illumina platform for three samples (Father, Mother and Son). I downloaded the exome interval list from Illumina's website.

    1) Trimmed the raw reads
    2) Aligned the trimmed reads against the human reference hg19 as recommended for exome-sequencing

    Hello, could someone please direct me to a reference that compares or states this? I have tried searching for comparisons between regarding which reference genome to use for mapping and have not found any definitive answers like this Nanda stated. Also maybe things changed from 2012?.... I am also mapping exome reads (SureSelectV5 and V6). Thank you!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Frank_cumc
    Hi,

    I am not sure about any paper @Nanda may have been referencing, but have a look at this article that may help.

    -Sheila

  • Hello, I've also used hg19 to align my reads and am now doing haplotype calling for 9 samples plus 1000genomes treated bams to reach the 30 samples threshold.
    What interval-list file should I use in Haplotypecaller with the parameter -L ?
    I didn't understand if this is a requirement as well.

  • shleeshlee CambridgeMember, Broadie, Moderator
    edited May 24

    Hi @Rosmaninho,

    The -L parameter is optional with HaplotypeCaller, e.g. if you wanted to quickly check the calls for a small region of the genome or if you are scattering over the genome, e.g. chr1. If you are calling on targeted exomes, then you can always subset variants to your on-target intervals of interest after the calling.

    P.S. Actually, for hg19, I believe there are regions that amass reads in a manner that is problematic for most callers. These types of problem regions have been ironed out in GRCh38. If you are stuck using hg19, then you should try to find 'calling regions' intervals that omit these problematic regions. In GRCh38, you should be able to call on the entirety of the reference (sans N regions of course) without issue.

  • From what I've understood the -L parameter is recommended to use with WES files.
    Since the extracted DNA from my samples was hybridized to the Agilent SeqCap EZ Library, I fetched the SeqCapEZ_Exome_v3.0_Design_Annotation_files (http://sequencing.roche.com/en/products-solutions/by-category/target-enrichment/hybridization/seqcap-ez-exome-v3-kit.html) and used the SeqCap_EZ_Exome_v3_hg19_primary_targets.bed in the -L parameter.

    This file contains the design primary target (unpadded) in hg19 coordinates and gene annotation in the 4th column.

    Since I only have 9 WES samples, I also ran the Haplotypecaller for 21 bam files with a similar geographic background to my samples, from the 1000Genomes, realigned to hg19. I also ran against the same bed file that I ran for my 9 samples.

    Do you think this is the correct approach?
    Should I run the 1000Genomes samples against the bed file that was referenced in the published paper?

    Furthermore, since we are considering adding extra samples to my analysis, we are in the decision process of performing WGS or WES in those extra samples. If we pick WGS is it possible to compare the results from our WES samples to the WGS samples?

  • shleeshlee CambridgeMember, Broadie, Moderator

    @Rosmaninho,

    Let me summarize your scenario. You want to call short variants on a cohort consisting of:

    • 9 targeted exome samples from capture kit A and
    • 21 exome samples from the 1000 Genomes Project that used capture kit B

    And use an intervals list with HaplotypeCaller consisting of:

    • unpadded targets from capture kit A

    Depending on your research aims, you may want to consider comparing the capture regions of kit A and B, as I do for the Hammer CNV tutorial, e.g.:

    and think about the implications of the extent of overlap for your analysis. For example, is filling up the squared-off matrix feature of HaplotypeCaller, that is possible in ERC mode, more important than getting every call even if some samples may not have coverage for calls, etc. Alternatively, to make the most of HaplotypeCaller compute, perhaps you might consider calling on the entirety of callable regions without applying the targets list for each sample in ERC GVCF mode, then in GenotypeGVCFs mode apply your intervals list of choice. Here you could test multiple intervals lists on the GVCFs with GenotypeGVCFs and would have saved on the upfront compute.

    I'm not all that familiar with the recommended practices for germline calling in hg19 for targeted exomes as I've only written about GRCh38 WGS calling and used exomes for CNV analysis. Perhaps @Sheila can add to this discussion.

Sign In or Register to comment.