Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What does the mean 'EMIT_ALL_CONFIDENT_SITES' in GATK UnifiedGenotyper?

HyunminHyunmin Seoul, KoreaMember
edited January 2014 in Ask the GATK team

Hi, everyone.

from.. GATK Document

-out_mode,--output_mode specifies which sites to emit; possible values are EMIT_VARIANTS_ONLY (the default), EMIT_ALL_CONFIDENT_SITES (include confident reference sites), or EMIT_ALL_SITES (any callable site regardless of confidence).

I really want to know the meaning of confident reference site.

When I calling with the GATK UnifiedGenotyper EMIT_ALL_CONFIDENT_SITES option in each sample BAM file, Can I distinguish the genotype in each sample? (No call, Ref homo, Alt homo, Hetero)

In other words, I know the some site is no call or ref homo for this purpose.

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    EMIT_ALL_CONFIDENT_SITES will tell you which sites are ref-hom, alt-hom, or alt-het with high confidence (=where the program is reasonably certain).

    EMIT_ALL_SITES will give you a genotype for all sites or ./. for no-calls, so you will know for every site if it is no-call or something else, but for some sites the assigned genotype may be very low quality. Also, this will produce very large output files.

  • HyunminHyunmin Seoul, KoreaMember

    Thanks, for your comment.

    But, I have an new issue.

    I made a merged.vcf (Each sample vcf file join together into one vcf file. In other words, this file is union set in all vcf files.)
    and I again calling the variant from each sample bam file with -L merged.vcf.

    java -jar -Xmx4g /home/hmkim87/analysis/xxxx/ExomeUnionVariantCall/Tool/GenomeAnalysisTKLite-2.3-9.jar -T UnifiedGenotyper -R /BiOfs/BioResources/References/Human/hg19/hg19.fa -glm BOTH -I /WES/ExomeUnionVariant/Input_BAM/T1304D2111.final.bam -o /WES/ExomeUnionVariant/UnifiedGenotyper_vcf_EMIT_ALL_SITES/T1304D2111.final.vcf --output_mode EMIT_ALL_SITES -dcov 200 -nct 4 -nt 1 -L /WES/ExomeUnionVariant/merge_vcf/merged.vcf

    java -jar -Xmx4g /home/hmkim87/analysis/xxxx/ExomeUnionVariantCall/Tool/GenomeAnalysisTKLite-2.3-9.jar -T UnifiedGenotyper -R /BiOfs/BioResources/References/Human/hg19/hg19.fa -glm BOTH -I /WES/ExomeUnionVariant/Input_BAM/T1304D2112.final.bam -o /WES/ExomeUnionVariant/UnifiedGenotyper_vcf_EMIT_ALL_SITES/T1304D2112.final.vcf --output_mode EMIT_ALL_SITES -dcov 200 -nct 4 -nt 1 -L /WES/ExomeUnionVariant/merge_vcf/merged.vcf

    ..same command (each sample)..

    I got the result in below.
    GT:GQ:DP:PL:AD . . . . . . . . . . . . . 0/1:99:38:272,0,892:26,10 . . 0/1:99:13:192,0,230:7,6 . 0/1:99:21:398,0,283:9,12 . 0/1:99:16:230,0,184:6,7

    Why the FORMAT column has the . (unknown variant information) in output vcf file?
    To get the all sample variant information, Can I must call the variant in all sample bam at once?

    like this..
    GATK UnifiedGenotyper -I sample1.bam -I sampe2.bam ...

Sign In or Register to comment.