How to now if a Combined vcf file is calibrated with GATK or not

Dear GATK using community
Pardon me for my ignorance, but I am new with using NGS pipelines. I have recieved a combined VCF file or 200 samples. The only information I was able to understand from VCF after opening it in command line and usegalaxy online server the variants are called by using Haplotypecaller of GATK tool. How can I identify that if the variants are only called using haplotypecaller or they were also validated or calibrated afterwards.

Best Answers

Answers

  • SkyWarriorSkyWarrior TurkeyMember

    Check the header of the VCF file. Final commands are usually saved to that space if not cleaned up.

  • UmerMaqsood10UmerMaqsood10 PakistanMember

    @Geraldine_VdAuwera said:
    ^^ What @SkyWarrior recommends is correct; if you are lucky the VCF header should contain that information. If not then it's more difficult. Sometimes the presence or absence of some annotations can be a clue; for example variant calls that have been through VQSR will have the VQSLOD annotations added.

    Dear Geraldine
    Thanks for the answer kindly help me to to look on the header of my file if you can get any idea from here.

    fileformat=VCFv4.2

    FILTER=<ID=LowQual,Description="Low quality">

    FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

    FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

    FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

    FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

    FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

    GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.5-0-g36282e4,Date="Tue Aug 09 22:49:57 CDT 2016",Epoch=1470800997921,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[CHKSEPI00000373_1_01k_1.bam, CHKSEPI00000373_1_2k_1.bam, CHKSEPI00000374_1_151k_1.bam,CHKSEPI00000374_1_152k_1.bam, CHKSEPI00000374_1_153k_1.bam, CHKSEPI00000374_1_154k_1.bam, CHKSEPI00000374_1_155k_1.bam, CHKSEPI00000374_1_156k_1.bam, CHKSEPI00000374_1_157k_1.bam, CHKSEPI00000374_1_158k_1.bam, CHKSEPI00000374_1_159k_1.bam, CHKSEPI00000374_1_160k_1.bam, CHKSEPI00000374_1_161k_1.bam, CHKSEPI00000374_1_162k_1.bam, CHKSEPI00000374_1_163k_1.bam, CHKSEPI00000374_1_164k_1.bam, CHKSEPI00000374_1_165k_1.bam, CHKSEPI00000374_1_166k_1.bam, CHKSEPI00000374_1_167k_1.bam, CHKSEPI00000374_1_168k_1.bam, CHKSEPI00000374_1_169k_1.bam, CHKSEPI00000374_1_170k_1.bam, CHKSEPI00000374_1_171k_1.bam, CHKSEPI00000374_1_172k_1.bam, CHKSEPI00000374_1_173k_1.bam, CHKSEPI00000374_1_174k_1.bam, CHKSEPI00000374_1_175k_1.bam, CHKSEPI00000374_1_176k_1.bam, CHKSEPI00000374_1_177k_1.bam, CHKSEPI00000374_1_178k_1.bam, CHKSEPI00000374_1_179k_1.bam, CHKSEPI00000374_1_180k_1.bam, CHKSEPI00000374_1_181k_1.bam, CHKSEPI00000374_1_182k_1.bam, CHKSEPI00000374_1_183k_1.bam, CHKSEPI00000374_1_184k_1.bam, CHKSEPI00000374_1_185k_1.bam, CHKSEPI00000374_1_186k_1.bam, CHKSEPI00000374_1_187k_1.bam, CHKSEPI00000374_1_188k_1.bam, CHKSEPI00000374_1_189k_1.bam, CHKSEPI00000374_1_190k_1.bam, CHKSEPI00000374_1_191k_1.bam, CHKSEPI00000374_1_192k_1.bam, CHKSEPI00000374_1_193k_1.bam, CHKSEPI00000374_1_194k_1.bam, CHKSEPI00000374_1_195k_1.bam, CHKSEPI00000374_1_197k_1.bam, CHKSEPI00000374_1_198k_1.bam, CHKSEPI00000374_1_199k_1.bam, CHKSEPI00000374_1_200k_1.bam, CHKSEPI00000373_1_201k_1.bam, CHKSEPI00000373_1_202k_1.bam, CHKSEPI00000373_1_203k_1.bam,CHKSEPI00000373_1_204k_1.bam, CHKSEPI00000373_1_205k_1.bam, CHKSEPI00000373_1_206k_1.bam, CHKSEPI00000373_1_207k_1.bam, CHKSEPI00000373_1_208k_1.bam, CHKSEPI00000373_1_209k_1.bam, CHKSEPI00000373_1_210k_1.bam, CHKSEPI00000373_1_211k_1.bam, CHKSEPI00000373_1_212k_1.bam, CHKSEPI00000373_1_213k_1.bam, CHKSEPI00000373_1_214k_1.bam, CHKSEPI00000373_1_215k_1.bam, CHKSEPI00000373_1_216k_1.bam, CHKSEPI00000373_1_217k_1.bam, CHKSEPI00000373_1_218k_1.bam, CHKSEPI00000373_1_219k_1.bam, CHKSEPI00000373_1_220k_1.bam, CHKSEPI00000374_1_221k_1.bam, CHKSEPI00000374_1_222k_1.bam, CHKSEPI00000374_1_223k_1.bam, CHKSEPI00000374_1_224k_1.bam, CHKSEPI00000374_1_225k_1.bam, CHKSEPI00000374_1_226k_1.bam, CHKSEPI00000374_1_227k_1.bam, CHKSEPI00000374_1_228k_1.bam, CHKSEPI00000374_1_229k_1.bam, CHKSEPI00000374_1_230k_1.bam, CHKSEPI00000374_1_231k_1.bam, CHKSEPI00000374_1_232k_1.bam, CHKSEPI00000374_1_233k_1.bam, CHKSEPI00000374_1_234k_1.bam, CHKSEPI00000374_1_235k_1.bam, CHKSEPI00000374_1_236k_1.bam, CHKSEPI00000374_1_253k_1.bam, CHKSEPI00000374_1_254k_1.bam, CHKSEPI00000374_1_255k_1.bam, CHKSEPI00000374_1_256k_1.bam, CHKSEPI00000374_1_257k_1.bam, CHKSEPI00000374_1_258k_1.bam, CHKSEPI00000374_1_259k_1.bam, CHKSEPI00000374_1_260k_1.bam, CHKSEPI00000374_1_261k_1.bam, CHKSEPI00000374_1_262k_1.bam, CHKSEPI00000374_1_263k_1.bam, CHKSEPI00000374_1_264k_1.bam, CHKSEPI00000373_1_265k_1.bam, CHKSEPI00000373_1_266k_1.bam, CHKSEPI00000373_1_267k_1.bam, CHKSEPI00000373_1_268k_1.bam, CHKSEPI00000373_1_269k_1.bam, CHKSEPI00000373_1_270k_1.bam, CHKSEPI00000373_1_271k_1.bam, CHKSEPI00000373_1_272k_1.bam, CHKSEPI00000373_1_273k_1.bam, CHKSEPI00000373_1_274k_1.bam, CHKSEPI00000373_1_275k_1.bam, CHKSEPI00000373_1_276k_1.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=39947_ref_IRGSP-1.0.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=true allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 logging_level=INFO log_to_file=null help=false version=false out=/data/shsze/assembly/rice/Rice_GBS_clean_unfoldered_data_112514/39947_ref_IRGSP-1.0.fa.vcf likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[] excludeAnnotation=[] group=[Standard, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE bamOutput=null bamWriterType=CALLED_HAPLOTYPES disableOptimizations=false annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=20.0 standard_min_confidence_threshold_for_emitting=20.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=true keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

    INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">

    INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

    INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">

    INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

    INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">

    INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

    INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

    INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">

    INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">

    INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">

    INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">

    INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">

    INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">

    INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

    INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

    INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">

    INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">

    INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Based on this it looks like VQSR was not applied to this data.

  • UmerMaqsood10UmerMaqsood10 PakistanMember

    @Geraldine_VdAuwera said:
    Based on this it looks like VQSR was not applied to this data.

    Dear Geraldine is there any way I can run Variant quality re-calibration using this combined file? or I need to split it first? The only thing I am good at is the gatktool available at galaxy server using available recaliberation option. ~~~~

  • UmerMaqsood10UmerMaqsood10 PakistanMember

    @shlee said:
    Hi @UmerMaqsood10,

    Yes, you may run VQSR on a combined VCF. There is no need to split samples into different files.

    Thank you very much shlee for this solution. I will try to do it using gatktool available in main galaxy platform Hope It would work. Thanks again for this help.

Sign In or Register to comment.