Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

output the 0|0 for PGT

frankfengfrankfeng Member
edited January 2016 in Ask the GATK team

Hi,

In the output VCF of GenotypeGVCFs, there are 0|1, 1|0, 1|1 and . for PGT, and no 0|0. When seeing a . for PGT, I feel uncertain, as I don't know whether it is actually a 0|0 or missing data (e.g. due to zero useful coverage at the position). So it will be great if GenotypeGVCFs can also output the 0|0.

Tested GATK version: V3.5

Thanks!

Frank

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @frankfeng
    Hi Frank,

    Are you using -allSites in you GenotypeGVCFs command?

    Can you post a few example records of the . in the PGT field?

    Thanks,
    Sheila

  • frankfengfrankfeng Member

    @Sheila
    Hi Sheila,

    No, I am not using -allSites in my GenotypeGVCFs command, and my understanding is that the -allSites option has nothing to do with PGT (please correct me if I am wrong).

    Below is an example VCF output of GenotypeGVCFs (only the FORMAT and Genotype sections are shown; see the . for PGT and PID of SAMPLE-2):

    FORMAT     SAMPLE-1     SAMPLE-2     SAMPLE-3
    GT:AD:DP:GQ:PGT:PID:PL     1/1:0,96:96:99:1|1:1019175_C_G:3803,287,0     1/1:0,111:111:99:.:.:3099,326,0     0/1:60,49:109:99:0|1:1019175_C_G:1795,0,2287
    GT:AD:DP:GQ:PGT:PID:PL     0/1:38,68:106:99:0|1:1019175_C_G:1540,0,776     0/0:121,0:121:99:.:.:0,120,1800     0/1:63,56:119:99:0|1:1019175_C_G:1923,0,2304

    What I expected is below (see what replace the . for PGT and PID of SAMPLE-2):

    FORMAT     SAMPLE-1     SAMPLE-2     SAMPLE-3
    GT:AD:DP:GQ:PGT:PID:PL     1/1:0,96:96:99:1|1:1019175_C_G:3803,287,0     1/1:0,111:111:99:1|1:1019175_C_G:3099,326,0     0/1:60,49:109:99:0|1:1019175_C_G:1795,0,2287
    GT:AD:DP:GQ:PGT:PID:PL     0/1:38,68:106:99:0|1:1019175_C_G:1540,0,776     0/0:121,0:121:99:0|0:1019175_C_G:0,120,1800     0/1:63,56:119:99:0|1:1019175_C_G:1923,0,2304

    As I mentioned in my previous post, there is no 0|0 PGT output at all. If I want 0|0 output, do I need to turn on a specific option for HaplotypeCaller or GenotypeGVCFs? Or that is not yet implemented with HaplotypeCaller and GenotypeGVCFs?

    Thanks!

    Frank

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @frankfeng
    Hi Frank,

    I think this is a representation issue. Let me check with the team and get back to you. Basically, sample 1 and sample 3 are phased at those two sites, but since sample 2 is homozygous reference at the second site, it is not phased. Can you please post the original bam file and bamout file of the region for sample 2?

    Thanks,
    Sheila

    Issue · Github
    by Sheila

    Issue Number
    546
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    vdauwera
  • @Sheila

    Hi Sheila,

    The original bam and "bamout file" of Sample-2 are attached: The HOM-ALT site is the first site, and the vertical center lines are at the second site.

    Thanks!

    Frank

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Can you post the full records (not just genotype fields) including the record that comes before the two that you are showing?

  • @Geraldine_VdAuwera said:
    Can you post the full records (not just genotype fields) including the record that comes before the two that you are showing?

    @Geraldine_VdAuwera

    Hi Geraldine, please see the attached vcf.gz file.

    Thanks.

    Frank

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Please post the records as a text comment. It takes less time for us to look at it that way. Also I can't access the file easily from a smartphone (which is what I use a lot of the time for answering support questions).

  • frankfengfrankfeng Member
    edited February 2016

    @Geraldine_VdAuwera said:
    Please post the records as a text comment. It takes less time for us to look at it that way. Also I can't access the file easily from a smartphone (which is what I use a lot of the time for answering support questions).

    Hi @Geraldine_VdAuwera please see below:

    ##fileformat=VCFv4.2
    ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
    ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
    ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
    ##GATKCommandLine.GenotypeGVCFs=<ID=GenotypeGVCFs,Version=3.5-0-g36282e4,Date="Sun Feb 07 18:05:59 CST 2016",Epoch=1454830559511,CommandLineOptions="analysis_type=GenotypeGVCFs input_file=[] showFullBamList=false read_buffer_size=null phone_home=NO_ET gatk_key=/data/sacgf/reference/map_n_call_pipeline/gatk/paul.wang_adelaide.edu.au.key tag=NA read_filter=[] disable_read_filter=[] intervals=[1:1009175-1029175] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/scratch/gatk_bundle/2.8/b37/human_g1k_v37_decoy.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 logging_level=INFO log_to_file=null help=false version=false variant=[(RodBindingCollection [(RodBinding name=variant source=SAMPLE-1.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant2 source=SAMPLE-2.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant3 source=SAMPLE-3.g.vcf.gz)])] out=/home/users/jfeng/test/test_PGT_2.vcf.gz includeNonVariantSites=false uniquifySamples=false annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=20.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 annotation=[] group=[Standard] dbsnp=(RodBinding name= source=UNBOUND) filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
    ##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.5-0-g36282e4,Date="Sun Jan 10 19:26:27 CST 2016",Epoch=1452416187963,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[bqsr/SAMPLE-3.bam] showFullBamList=false read_buffer_size=null phone_home=NO_ET gatk_key=/data/sacgf/reference/map_n_call_pipeline/gatk/paul.wang_adelaide.edu.au.key tag=NA read_filter=[] disable_read_filter=[] intervals=[/data/sacgf/reference/hg19/ExomeCaptureRegions/Nimblegen/Nimblegen_SeqCap_EQ_Exome_v3/SeqCap_EZ_Exome_v3_capture.1000bp_padded_each_side.merged.Ensembl_style.cut_columns.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/scratch/gatk_bundle/2.8/b37/human_g1k_v37_decoy.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=2 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[Standard, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES disableOptimizations=false annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=-0.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
    ##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
    ##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
    ##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
    ##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
    ##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
    ##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
    ##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
    ##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
    ##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
    ##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
    ##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
    ##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
    ##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
    ##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
    ##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
    ##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
    ##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
    ##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
    ##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
    ##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
    ##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
    ##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
    ##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
    ##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
    ##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
    ##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
    ##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
    ##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
    ##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
    ##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
    ##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
    ##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
    ##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
    ##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
    ##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
    ##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
    ##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
    ##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
    ##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
    ##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
    ##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
    ##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
    ##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
    ##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
    ##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
    ##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
    ##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
    ##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
    ##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
    ##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
    ##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
    ##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
    ##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
    ##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
    ##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
    ##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
    ##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
    ##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
    ##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
    ##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
    ##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
    ##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
    ##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
    ##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
    ##GVCFBlock99-2147483647=minGQ=99(inclusive),maxGQ=2147483647(exclusive)
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
    ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
    ##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
    ##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
    ##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
    ##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
    ##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
    ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
    ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
    ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
    ##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
    ##contig=<ID=1,length=249250621,assembly=b37>
    ##contig=<ID=2,length=243199373,assembly=b37>
    ##contig=<ID=3,length=198022430,assembly=b37>
    ##contig=<ID=4,length=191154276,assembly=b37>
    ##contig=<ID=5,length=180915260,assembly=b37>
    ##contig=<ID=6,length=171115067,assembly=b37>
    ##contig=<ID=7,length=159138663,assembly=b37>
    ##contig=<ID=8,length=146364022,assembly=b37>
    ##contig=<ID=9,length=141213431,assembly=b37>
    ##contig=<ID=10,length=135534747,assembly=b37>
    ##contig=<ID=11,length=135006516,assembly=b37>
    ##contig=<ID=12,length=133851895,assembly=b37>
    ##contig=<ID=13,length=115169878,assembly=b37>
    ##contig=<ID=14,length=107349540,assembly=b37>
    ##contig=<ID=15,length=102531392,assembly=b37>
    ##contig=<ID=16,length=90354753,assembly=b37>
    ##contig=<ID=17,length=81195210,assembly=b37>
    ##contig=<ID=18,length=78077248,assembly=b37>
    ##contig=<ID=19,length=59128983,assembly=b37>
    ##contig=<ID=20,length=63025520,assembly=b37>
    ##contig=<ID=21,length=48129895,assembly=b37>
    ##contig=<ID=22,length=51304566,assembly=b37>
    ##contig=<ID=X,length=155270560,assembly=b37>
    ##contig=<ID=Y,length=59373566,assembly=b37>
    ##contig=<ID=MT,length=16569,assembly=b37>
    ##contig=<ID=GL000207.1,length=4262,assembly=b37>
    ##contig=<ID=GL000226.1,length=15008,assembly=b37>
    ##contig=<ID=GL000229.1,length=19913,assembly=b37>
    ##contig=<ID=GL000231.1,length=27386,assembly=b37>
    ##contig=<ID=GL000210.1,length=27682,assembly=b37>
    ##contig=<ID=GL000239.1,length=33824,assembly=b37>
    ##contig=<ID=GL000235.1,length=34474,assembly=b37>
    ##contig=<ID=GL000201.1,length=36148,assembly=b37>
    ##contig=<ID=GL000247.1,length=36422,assembly=b37>
    ##contig=<ID=GL000245.1,length=36651,assembly=b37>
    ##contig=<ID=GL000197.1,length=37175,assembly=b37>
    ##contig=<ID=GL000203.1,length=37498,assembly=b37>
    ##contig=<ID=GL000246.1,length=38154,assembly=b37>
    ##contig=<ID=GL000249.1,length=38502,assembly=b37>
    ##contig=<ID=GL000196.1,length=38914,assembly=b37>
    ##contig=<ID=GL000248.1,length=39786,assembly=b37>
    ##contig=<ID=GL000244.1,length=39929,assembly=b37>
    ##contig=<ID=GL000238.1,length=39939,assembly=b37>
    ##contig=<ID=GL000202.1,length=40103,assembly=b37>
    ##contig=<ID=GL000234.1,length=40531,assembly=b37>
    ##contig=<ID=GL000232.1,length=40652,assembly=b37>
    ##contig=<ID=GL000206.1,length=41001,assembly=b37>
    ##contig=<ID=GL000240.1,length=41933,assembly=b37>
    ##contig=<ID=GL000236.1,length=41934,assembly=b37>
    ##contig=<ID=GL000241.1,length=42152,assembly=b37>
    ##contig=<ID=GL000243.1,length=43341,assembly=b37>
    ##contig=<ID=GL000242.1,length=43523,assembly=b37>
    ##contig=<ID=GL000230.1,length=43691,assembly=b37>
    ##contig=<ID=GL000237.1,length=45867,assembly=b37>
    ##contig=<ID=GL000233.1,length=45941,assembly=b37>
    ##contig=<ID=GL000204.1,length=81310,assembly=b37>
    ##contig=<ID=GL000198.1,length=90085,assembly=b37>
    ##contig=<ID=GL000208.1,length=92689,assembly=b37>
    ##contig=<ID=GL000191.1,length=106433,assembly=b37>
    ##contig=<ID=GL000227.1,length=128374,assembly=b37>
    ##contig=<ID=GL000228.1,length=129120,assembly=b37>
    ##contig=<ID=GL000214.1,length=137718,assembly=b37>
    ##contig=<ID=GL000221.1,length=155397,assembly=b37>
    ##contig=<ID=GL000209.1,length=159169,assembly=b37>
    ##contig=<ID=GL000218.1,length=161147,assembly=b37>
    ##contig=<ID=GL000220.1,length=161802,assembly=b37>
    ##contig=<ID=GL000213.1,length=164239,assembly=b37>
    ##contig=<ID=GL000211.1,length=166566,assembly=b37>
    ##contig=<ID=GL000199.1,length=169874,assembly=b37>
    ##contig=<ID=GL000217.1,length=172149,assembly=b37>
    ##contig=<ID=GL000216.1,length=172294,assembly=b37>
    ##contig=<ID=GL000215.1,length=172545,assembly=b37>
    ##contig=<ID=GL000205.1,length=174588,assembly=b37>
    ##contig=<ID=GL000219.1,length=179198,assembly=b37>
    ##contig=<ID=GL000224.1,length=179693,assembly=b37>
    ##contig=<ID=GL000223.1,length=180455,assembly=b37>
    ##contig=<ID=GL000195.1,length=182896,assembly=b37>
    ##contig=<ID=GL000212.1,length=186858,assembly=b37>
    ##contig=<ID=GL000222.1,length=186861,assembly=b37>
    ##contig=<ID=GL000200.1,length=187035,assembly=b37>
    ##contig=<ID=GL000193.1,length=189789,assembly=b37>
    ##contig=<ID=GL000194.1,length=191469,assembly=b37>
    ##contig=<ID=GL000225.1,length=211173,assembly=b37>
    ##contig=<ID=GL000192.1,length=547496,assembly=b37>
    ##contig=<ID=NC_007605,length=171823,assembly=b37>
    ##contig=<ID=hs37d5,length=35477943,assembly=b37>
    ##reference=file:///scratch/gatk_bundle/2.8/b37/human_g1k_v37_decoy.fasta
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE-1    SAMPLE-2    SAMPLE-3
    1   1017273 .   C   A   23.89   LowQual AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=11.95;SOR=0.693 GT:AD:DP:GQ:PL  1/1:0,2:2:6:49,6,0  ./.:0,0:0   ./.:0,0:0
    1   1017341 .   G   T   142.60  .   AC=3;AF=0.750;AN=4;BaseQRankSum=0.736;ClippingRankSum=-7.360e-01;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=55.55;MQRankSum=-7.360e-01;QD=23.77;ReadPosRankSum=-7.360e-01;SOR=2.303  GT:AD:DP:GQ:PL  ./.:1,0:1   1/1:0,3:3:9:110,9,0 0/1:1,2:3:24:61,0,24
    1   1017587 .   C   T   6109.90 .   AC=3;AF=0.500;AN=6;BaseQRankSum=0.629;ClippingRankSum=0.585;DP=332;ExcessHet=1.5490;FS=3.980;MLEAC=3;MLEAF=0.500;MQ=60.00;MQRankSum=0.053;QD=20.71;ReadPosRankSum=-5.900e-01;SOR=0.970  GT:AD:DP:GQ:PL  0/1:69,48:117:99:1013,0,1482    1/1:0,178:178:99:5133,527,0 0/0:36,0:36:99:0,102,1018
    1   1018144 .   T   C   2759.16 .   AC=2;AF=0.333;AN=6;BaseQRankSum=-3.470e-01;ClippingRankSum=0.371;DP=301;ExcessHet=3.9794;FS=0.000;MLEAC=2;MLEAF=0.333;MQ=60.00;MQRankSum=-2.930e-01;QD=10.99;ReadPosRankSum=1.15;SOR=0.706  GT:AD:DP:GQ:PL  0/1:45,45:90:99:960,0,996   0/0:45,0:45:99:0,102,1530   0/1:83,78:161:99:1831,0,1860
    1   1018562 .   C   T   414.16  .   AC=2;AF=0.333;AN=6;BaseQRankSum=0.163;ClippingRankSum=1.29;DP=70;ExcessHet=3.9794;FS=0.000;MLEAC=2;MLEAF=0.333;MQ=60.00;MQRankSum=0.922;QD=8.81;ReadPosRankSum=1.96;SOR=0.489   GT:AD:DP:GQ:PL  0/1:6,10:16:99:181,0,114    0/0:23,0:23:57:0,57,855 0/1:15,16:31:99:265,0,368
    1   1018704 .   A   G   20.92   LowQual AC=2;AF=0.500;AN=4;DP=4;ExcessHet=0.7918;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=48.99;QD=10.46;SOR=0.693   GT:AD:DP:GQ:PL  1/1:0,2:2:6:49,6,0  0/0:1,0:1:3:0,3,30  ./.:1,0:1
    1   1019175 .   C   G   8666.13 .   AC=5;AF=0.833;AN=6;BaseQRankSum=1.31;ClippingRankSum=-1.855e+00;DP=319;ExcessHet=3.0103;FS=5.413;MLEAC=5;MLEAF=0.833;MQ=60.00;MQRankSum=-1.447e+00;QD=27.42;ReadPosRankSum=-1.014e+00;SOR=1.108 GT:AD:DP:GQ:PGT:PID:PL  1/1:0,96:96:99:1|1:1019175_C_G:3803,287,0   1/1:0,111:111:99:.:.:3099,326,0 0/1:60,49:109:99:0|1:1019175_C_G:1795,0,2287
    1   1019180 .   T   C   3431.16 .   AC=2;AF=0.333;AN=6;BaseQRankSum=1.13;ClippingRankSum=1.21;DP=348;ExcessHet=3.9794;FS=4.997;MLEAC=2;MLEAF=0.333;MQ=60.00;MQRankSum=0.326;QD=15.25;ReadPosRankSum=0.399;SOR=1.036 GT:AD:DP:GQ:PGT:PID:PL  0/1:38,68:106:99:0|1:1019175_C_G:1540,0,776 0/0:121,0:121:99:.:.:0,120,1800 0/1:63,56:119:99:0|1:1019175_C_G:1923,0,2304
    1   1020406 .   T   C   53.42   .   AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=51.96;QD=17.81;SOR=1.179 GT:AD:DP:GQ:PL  1/1:0,3:3:9:79,9,0  ./.:1,0:1   ./.:0,0:0
    1   1021346 .   A   G   6060.90 .   AC=3;AF=0.500;AN=6;BaseQRankSum=0.638;ClippingRankSum=-4.400e-01;DP=380;ExcessHet=1.5490;FS=6.494;MLEAC=3;MLEAF=0.500;MQ=60.00;MQRankSum=1.33;QD=18.48;ReadPosRankSum=3.00;SOR=0.402    GT:AD:DP:GQ:PL  0/1:79,81:160:99:1764,0,1624    1/1:0,168:168:99:4333,494,0 0/0:50,0:50:99:0,120,1800
    1   1021415 .   A   G   5465.13 .   AC=5;AF=0.833;AN=6;BaseQRankSum=-2.020e-01;ClippingRankSum=1.62;DP=259;ExcessHet=3.0103;FS=11.453;MLEAC=5;MLEAF=0.833;MQ=60.00;MQRankSum=-1.665e+00;QD=21.43;ReadPosRankSum=0.499;SOR=1.451 GT:AD:DP:GQ:PL  1/1:0,79:79:99:2236,234,0   1/1:0,74:74:99:2227,219,0   0/1:57,45:102:99:1033,0,1340
    1   1021583 .   A   C   356.25  .   AC=3;AF=0.500;AN=6;BaseQRankSum=2.36;ClippingRankSum=0.387;DP=33;ExcessHet=1.5490;FS=0.000;MLEAC=3;MLEAF=0.500;MQ=60.00;MQRankSum=0.466;QD=22.27;ReadPosRankSum=0.717;SOR=1.609 GT:AD:DP:GQ:PL  0/1:5,6:11:99:201,0,112 1/1:0,5:5:15:191,15,0   0/0:16,0:16:42:0,42,630
    1   1023145 .   G   A   130.19  .   AC=3;AF=0.500;AN=6;BaseQRankSum=1.23;ClippingRankSum=-1.231e+00;DP=10;ExcessHet=1.5490;FS=0.000;MLEAC=3;MLEAF=0.500;MQ=60.00;MQRankSum=0.358;QD=16.27;ReadPosRankSum=0.358;SOR=0.269    GT:AD:DP:GQ:PL  0/1:2,3:5:41:82,0,41    1/1:0,3:3:9:81,9,0  0/0:2,0:2:6:0,6,67
    1   1023444 .   C   G   51.42   .   AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=17.14;SOR=1.179 GT:AD:DP:GQ:PL  ./.:0,0:0   ./.:0,0:0   1/1:0,3:3:9:77,9,0
    1   1025301 .   T   C   23.89   LowQual AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=48.99;QD=11.95;SOR=0.693 GT:AD:DP:GQ:PL  ./.:0,0:0   ./.:1,0:1   1/1:0,2:2:6:49,6,0
    1   1026707 .   C   A   1965.92 .   AC=3;AF=0.500;AN=6;BaseQRankSum=-3.083e+00;ClippingRankSum=0.990;DP=241;ExcessHet=6.9897;FS=2.898;MLEAC=3;MLEAF=0.500;MQ=60.00;MQRankSum=-8.640e-01;QD=8.16;ReadPosRankSum=0.325;SOR=0.485  GT:AD:DP:GQ:PL  0/1:37,37:74:99:669,0,854   0/1:41,27:68:99:394,0,1021  0/1:49,50:99:99:933,0,1156
    1   1026801 .   T   A   8572.16 .   AC=4;AF=0.667;AN=6;BaseQRankSum=-3.266e+00;ClippingRankSum=1.40;DP=694;ExcessHet=3.9794;FS=4.378;MLEAC=4;MLEAF=0.667;MQ=60.00;MQRankSum=0.665;QD=12.57;ReadPosRankSum=1.92;SOR=0.447    GT:AD:DP:GQ:PL  1/1:1,210:211:99:4675,594,0 0/1:111,91:202:99:1478,0,2359   0/1:118,151:269:99:2451,0,2446
    1   1027008 .   G   A   916.14  .   AC=2;AF=0.500;AN=4;BaseQRankSum=1.70;ClippingRankSum=-2.910e-01;DP=97;ExcessHet=1.5490;FS=5.971;MLEAC=3;MLEAF=0.750;MQ=41.08;MQRankSum=0.873;QD=30.54;ReadPosRankSum=1.04;SOR=3.767 GT:AD:DP:GQ:PL  ./.:34,0:34 0/0:33,0:33:0:0,0,581   1/1:2,28:30:35:944,35,0
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sorry for the late response. These records look ok; the second sample does not have phasing information because it is hom-ref at the second site. We don't output phasing information at hom-ref sites. To have phasing info for the second sample you would have to have another site where it has a variant allele.

  • frankfengfrankfeng Member
    edited February 2016

    @Geraldine_VdAuwera said:
    Sorry for the late response. These records look ok; the second sample does not have phasing information because it is hom-ref at the second site. We don't output phasing information at hom-ref sites. To have phasing info for the second sample you would have to have another site where it has a variant allele.

    Hi @Geraldine_VdAuwera

    Thanks for your response. It is just a feature request and I think it will make the output better. In the output of GenotypeGVCFs, there is . but no 0|0 for PGT. And we know . can stand for missing data (e.g. due to zero useful coverage at the position).

    For example, after I used GATK's VariantsToTable to extract PGT and save them in TSV format (see below), I saw so many NA for PGT, and I don't know whether they are 0|0 or missing data. So it will be better if 0|0 can be output for PGT too. Thanks!

    CHROM   POS     TYPE    SAMPLE-1.PID    SAMPLE-1.PGT    SAMPLE-2.PID    SAMPLE-2.PGT    SAMPLE-3.PID    SAMPLE-3.PGT
    1       1017273 SNP     NA      NA      NA      NA      NA      NA
    1       1017341 SNP     NA      NA      NA      NA      NA      NA
    1       1017587 SNP     NA      NA      NA      NA      NA      NA
    1       1018144 SNP     NA      NA      NA      NA      NA      NA
    1       1018562 SNP     NA      NA      NA      NA      NA      NA
    1       1018704 SNP     NA      NA      NA      NA      NA      NA
    1       1019175 SNP     1019175_C_G     1|1     NA      NA      1019175_C_G     0|1
    1       1019180 SNP     1019175_C_G     0|1     NA      NA      1019175_C_G     0|1
    1       1020406 SNP     NA      NA      NA      NA      NA      NA
    1       1021346 SNP     NA      NA      NA      NA      NA      NA
    1       1021415 SNP     NA      NA      NA      NA      NA      NA
    1       1021583 SNP     NA      NA      NA      NA      NA      NA
    1       1023145 SNP     NA      NA      NA      NA      NA      NA
    1       1023444 SNP     NA      NA      NA      NA      NA      NA
    1       1025301 SNP     NA      NA      NA      NA      NA      NA
    1       1026707 SNP     NA      NA      NA      NA      NA      NA
    1       1026801 SNP     NA      NA      NA      NA      NA      NA
    1       1027008 SNP     NA      NA      NA      NA      NA      NA
    
Sign In or Register to comment.