What does the mean 'EMIT_ALL_CONFIDENT_SITES' in GATK UnifiedGenotyper?

HyunminHyunmin Seoul, KoreaMember
edited January 2014 in Ask the GATK team

Hi, everyone.

from.. GATK Document

-out_mode,--output_mode specifies which sites to emit; possible values are EMIT_VARIANTS_ONLY (the default), EMIT_ALL_CONFIDENT_SITES (include confident reference sites), or EMIT_ALL_SITES (any callable site regardless of confidence).

I really want to know the meaning of confident reference site.

When I calling with the GATK UnifiedGenotyper EMIT_ALL_CONFIDENT_SITES option in each sample BAM file, Can I distinguish the genotype in each sample? (No call, Ref homo, Alt homo, Hetero)

In other words, I know the some site is no call or ref homo for this purpose.

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    EMIT_ALL_CONFIDENT_SITES will tell you which sites are ref-hom, alt-hom, or alt-het with high confidence (=where the program is reasonably certain).

    EMIT_ALL_SITES will give you a genotype for all sites or ./. for no-calls, so you will know for every site if it is no-call or something else, but for some sites the assigned genotype may be very low quality. Also, this will produce very large output files.

  • HyunminHyunmin Seoul, KoreaMember

    Thanks, for your comment.

    But, I have an new issue.

    I made a merged.vcf (Each sample vcf file join together into one vcf file. In other words, this file is union set in all vcf files.)
    and I again calling the variant from each sample bam file with -L merged.vcf.

    java -jar -Xmx4g /home/hmkim87/analysis/xxxx/ExomeUnionVariantCall/Tool/GenomeAnalysisTKLite-2.3-9.jar -T UnifiedGenotyper -R /BiOfs/BioResources/References/Human/hg19/hg19.fa -glm BOTH -I /WES/ExomeUnionVariant/Input_BAM/T1304D2111.final.bam -o /WES/ExomeUnionVariant/UnifiedGenotyper_vcf_EMIT_ALL_SITES/T1304D2111.final.vcf --output_mode EMIT_ALL_SITES -dcov 200 -nct 4 -nt 1 -L /WES/ExomeUnionVariant/merge_vcf/merged.vcf

    java -jar -Xmx4g /home/hmkim87/analysis/xxxx/ExomeUnionVariantCall/Tool/GenomeAnalysisTKLite-2.3-9.jar -T UnifiedGenotyper -R /BiOfs/BioResources/References/Human/hg19/hg19.fa -glm BOTH -I /WES/ExomeUnionVariant/Input_BAM/T1304D2112.final.bam -o /WES/ExomeUnionVariant/UnifiedGenotyper_vcf_EMIT_ALL_SITES/T1304D2112.final.vcf --output_mode EMIT_ALL_SITES -dcov 200 -nct 4 -nt 1 -L /WES/ExomeUnionVariant/merge_vcf/merged.vcf

    ..same command (each sample)..

    I got the result in below.
    GT:GQ:DP:PL:AD . . . . . . . . . . . . . 0/1:99:38:272,0,892:26,10 . . 0/1:99:13:192,0,230:7,6 . 0/1:99:21:398,0,283:9,12 . 0/1:99:16:230,0,184:6,7

    Why the FORMAT column has the . (unknown variant information) in output vcf file?
    To get the all sample variant information, Can I must call the variant in all sample bam at once?

    like this..
    GATK UnifiedGenotyper -I sample1.bam -I sampe2.bam ...

Sign In or Register to comment.