Haplotype Caller output missing fileds AC/ AN

I'm a bit confused regarding the new GATK version and new HC-functions. I'm trying to call haplotypes in a family of plants. I call Haplotypes using haplotype caller, then I want to run Read-backed phasing on the raw vcfs and then CalculateGenotypePosterios to add pedigree information. The CalculateGenotypePosterios-Walker seems to need the format Fields AC and AN, but they are not produced by the HaplotypeCaller. They used to be in earlier HC-Versions though...(?). How can I fix this? And is this a proper workflow at all? Is Read-backed phasing needed or has it become redundant with the new HC-Version being able to do physical phasing. Would it be "enough" to run HC for phasing and CalculateGenotypePosterios to add pedigree information? Anyhow the problem of missing ac and an fields remains. I would be greatful for some help on this.

Thsi is how a raw vcf produced by HC looks like

fileformat=VCFv4.1

ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">

FILTER=<ID=LowQual,Description="Low quality">

FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">

FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">

FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">

FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

GATKCommandLine=<ID=HaplotypeCaller,Version=3.3-0-g37228af,Date="Fri Jan 30 12:04:00 CET 2015",Epoch=1422615840668,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/prj/gf-grape/project_FTC_in_crops/members/Nadia/test/GfGa4742_CGATGT_vs_candidategenes.sorted.readgroups.deduplicated.realigned.recalibrated.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/prj/gf-grape/project_FTC_in_crops/members/Nadia/amplicons_run3/GATK_new/RefSequences_all_candidate_genes.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=true bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN graphOutput=null bamWriterType=CALLED_HAPLOTYPES disableOptimizations=false dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[ClippingRankSumTest, DepthPerSampleHC, StrandBiasBySample] excludeAnnotation=[SpanningDeletions, TandemRepeatAnnotator, ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=-0.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 minPruning=2 gcpHMM=10 includeUmappedReads=false useAllelesTrigger=false phredScaledGlobalReadMismappingRate=45 maxNumHaplotypesInPopulation=2 mergeVariantsViaLD=false doNotRunPhysicalPhasing=false pair_hmm_implementation=VECTOR_LOGLESS_CACHING keepRG=null justDetermineActiveRegions=false dontGenotype=false errorCorrectKmers=false debugGraphTransformations=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false allowCyclesInKmerGraphToGeneratePaths=false noFpga=false errorCorrectReads=false kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=1000 minReadsPerAlignmentStart=5 activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

GVCFBlock=minGQ=0(inclusive),maxGQ=1(exclusive)

GVCFBlock=minGQ=1(inclusive),maxGQ=2(exclusive)

GVCFBlock=minGQ=10(inclusive),maxGQ=11(exclusive)

GVCFBlock=minGQ=11(inclusive),maxGQ=12(exclusive)

GVCFBlock=minGQ=12(inclusive),maxGQ=13(exclusive)

GVCFBlock=minGQ=13(inclusive),maxGQ=14(exclusive)

GVCFBlock=minGQ=14(inclusive),maxGQ=15(exclusive)

GVCFBlock=minGQ=15(inclusive),maxGQ=16(exclusive)

GVCFBlock=minGQ=16(inclusive),maxGQ=17(exclusive)

GVCFBlock=minGQ=17(inclusive),maxGQ=18(exclusive)

GVCFBlock=minGQ=18(inclusive),maxGQ=19(exclusive)

GVCFBlock=minGQ=19(inclusive),maxGQ=20(exclusive)

GVCFBlock=minGQ=2(inclusive),maxGQ=3(exclusive)

GVCFBlock=minGQ=20(inclusive),maxGQ=21(exclusive)

GVCFBlock=minGQ=21(inclusive),maxGQ=22(exclusive)

GVCFBlock=minGQ=22(inclusive),maxGQ=23(exclusive)

GVCFBlock=minGQ=23(inclusive),maxGQ=24(exclusive)

GVCFBlock=minGQ=24(inclusive),maxGQ=25(exclusive)

GVCFBlock=minGQ=25(inclusive),maxGQ=26(exclusive)

GVCFBlock=minGQ=26(inclusive),maxGQ=27(exclusive)

GVCFBlock=minGQ=27(inclusive),maxGQ=28(exclusive)

GVCFBlock=minGQ=28(inclusive),maxGQ=29(exclusive)

GVCFBlock=minGQ=29(inclusive),maxGQ=30(exclusive)

GVCFBlock=minGQ=3(inclusive),maxGQ=4(exclusive)

GVCFBlock=minGQ=30(inclusive),maxGQ=31(exclusive)

GVCFBlock=minGQ=31(inclusive),maxGQ=32(exclusive)

GVCFBlock=minGQ=32(inclusive),maxGQ=33(exclusive)

GVCFBlock=minGQ=33(inclusive),maxGQ=34(exclusive)

GVCFBlock=minGQ=34(inclusive),maxGQ=35(exclusive)

GVCFBlock=minGQ=35(inclusive),maxGQ=36(exclusive)

GVCFBlock=minGQ=36(inclusive),maxGQ=37(exclusive)

GVCFBlock=minGQ=37(inclusive),maxGQ=38(exclusive)

GVCFBlock=minGQ=38(inclusive),maxGQ=39(exclusive)

GVCFBlock=minGQ=39(inclusive),maxGQ=40(exclusive)

GVCFBlock=minGQ=4(inclusive),maxGQ=5(exclusive)

GVCFBlock=minGQ=40(inclusive),maxGQ=41(exclusive)

GVCFBlock=minGQ=41(inclusive),maxGQ=42(exclusive)

GVCFBlock=minGQ=42(inclusive),maxGQ=43(exclusive)

GVCFBlock=minGQ=43(inclusive),maxGQ=44(exclusive)

GVCFBlock=minGQ=44(inclusive),maxGQ=45(exclusive)

GVCFBlock=minGQ=45(inclusive),maxGQ=46(exclusive)

GVCFBlock=minGQ=46(inclusive),maxGQ=47(exclusive)

GVCFBlock=minGQ=47(inclusive),maxGQ=48(exclusive)

GVCFBlock=minGQ=48(inclusive),maxGQ=49(exclusive)

GVCFBlock=minGQ=49(inclusive),maxGQ=50(exclusive)

GVCFBlock=minGQ=5(inclusive),maxGQ=6(exclusive)

GVCFBlock=minGQ=50(inclusive),maxGQ=51(exclusive)

GVCFBlock=minGQ=51(inclusive),maxGQ=52(exclusive)

GVCFBlock=minGQ=52(inclusive),maxGQ=53(exclusive)

GVCFBlock=minGQ=53(inclusive),maxGQ=54(exclusive)

GVCFBlock=minGQ=54(inclusive),maxGQ=55(exclusive)

GVCFBlock=minGQ=55(inclusive),maxGQ=56(exclusive)

GVCFBlock=minGQ=56(inclusive),maxGQ=57(exclusive)

GVCFBlock=minGQ=57(inclusive),maxGQ=58(exclusive)

GVCFBlock=minGQ=58(inclusive),maxGQ=59(exclusive)

GVCFBlock=minGQ=59(inclusive),maxGQ=60(exclusive)

GVCFBlock=minGQ=6(inclusive),maxGQ=7(exclusive)

GVCFBlock=minGQ=60(inclusive),maxGQ=70(exclusive)

GVCFBlock=minGQ=7(inclusive),maxGQ=8(exclusive)

GVCFBlock=minGQ=70(inclusive),maxGQ=80(exclusive)

GVCFBlock=minGQ=8(inclusive),maxGQ=9(exclusive)

GVCFBlock=minGQ=80(inclusive),maxGQ=90(exclusive)

GVCFBlock=minGQ=9(inclusive),maxGQ=10(exclusive)

GVCFBlock=minGQ=90(inclusive),maxGQ=99(exclusive)

GVCFBlock=minGQ=99(inclusive),maxGQ=2147483647(exclusive)

INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">

INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">

INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">

INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">

INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">

INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">

INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">

INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">

contig=<ID=GSVIVT01012145001,length=8683>

contig=<ID=GSVIVT01012049001,length=18657>

contig=<ID=GSVIVT01012249001,length=14432>

contig=<ID=GSVIVT01011652001,length=6117>

contig=<ID=GSVIVT01011710001plu,length=4623>

contig=<ID=GSVIVT01012250001plu,length=27163>

contig=<ID=GSVIVT01011947001,length=3289>

contig=<ID=GSVIVT01011821001,length=7310>

contig=<ID=GSVIVT01011897001,length=5751>

contig=<ID=GSVIVT01022014001,length=6337>

contig=<ID=GSVIVT01011387001,length=11582>

contig=<ID=GSVIVT01036237001,length=18407>

contig=<ID=GSVIVT01036499001_CO,length=4568>

contig=<ID=GSVIVT01020232001,length=21274>

contig=<ID=GSVIVT01030735001,length=3570>

contig=<ID=GSVIVT01011433001,length=5349>

contig=<ID=GSVIVT01011939001,length=73679>

contig=<ID=GSVIVT01021854001,length=5609>

contig=<ID=GSVIVT01036549001plu,length=22905>

contig=<ID=GSVIVT01031112001,length=5884>

contig=<ID=GSVIVT01036551001plu,length=18328>

contig=<ID=GSVIVT01031354001,length=8603>

contig=<ID=GSVIVT01008655001_pl,length=4022>

contig=<ID=GSVIVT01031338001,length=6893>

contig=<ID=GSVIVT01019969001,length=5388>

contig=<ID=GSVIVT01032607001,length=8294>

contig=<ID=GSVIVT01010521001,length=19492>

contig=<ID=GSVIVT01036447001,length=6911>

contig=<ID=GSVIVT01010513001,length=23656>

contig=<ID=GSVIVT01033067001,length=28278>

reference=file:///prj/gf-grape/project_FTC_in_crops/members/Nadia/amplicons_run3/GATK_new/RefSequences_all_candidate_genes.fasta

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GfGa4742

GSVIVT01012145001 1 . G . . END=113 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
GSVIVT01012145001 114 . C . . END=164 GT:DP:GQ:MIN_DP:PL 0/0:172:99:164:0,120,1800
GSVIVT01012145001 165 . T C, 7732.77 . DP=175;MLEAC=2,0;MLEAF=1.00,0.00;MQ=60.00;MQ0=0 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,173,0:173:99:0|1:165_T_C:7761,521,0,7761,521,7761:0,0,165,8
GSVIVT01012145001 166 . G . . END=166 GT:DP:GQ:MIN_DP:PL 0/0:174:72:174:0,72,1080
GSVIVT01012145001 167 . T . . END=175 GT:DP:GQ:MIN_DP:PL 0/0:174:66:174:0,60,900
GSVIVT01012145001 176 . T . . END=191 GT:DP:GQ:MIN_DP:PL 0/0:174:57:173:0,57,855
GSVIVT01012145001 192 . A . . END=194 GT:DP:GQ:MIN_DP:PL 0/0:173:54:173:0,54,810
GSVIVT01012145001 195 . T . . END=199 GT:DP:GQ:MIN_DP:PL 0/0:174:51:173:0,51,765

And this is the Error Message I get

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Key AC found in VariantContext field INFO at GSVIVT01012145001:1 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

Best Answers

Answers

Sign In or Register to comment.