We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What is the difference using "--genotyping_mode DISCOVERY" or leave it blank?

AndZimAndZim GermanyMember
edited May 2015 in Ask the GATK team

What is the difference using:

java -jar GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R reference.fa \
-I preprocessed_reads.bam \
-L 20 \
**--genotyping_mode DISCOVERY **
-stand_emit_conf 10 \
-stand_call_conf 30 \
-o raw_variants.vcf

or

java -jar GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R reference.fa \
-I preprocessed_reads.bam \
-L 20 \
-stand_emit_conf 10 \
-stand_call_conf 30 \
-o raw_variants.vcf

Tagged:

Best Answer

Answers

  • mglclinicalmglclinical USAMember

    Hi @Sheila,

    If the default value for --genotyping_mode is "DISCOVERY", why does the documentation here, say "NA" instead ?

    I ran HaplotypeCaller(HC) with out the --genotyping_mode parameter, and the log file for HaplotypeCaller is posted below. In this HC log output , why I do not see the --genotyping_mode parameter ?

    Thanks,
    mglclinical

    INFO 00:54:49,169 HelpFormatter - --------------------------------------------------------------------------------
    INFO 00:54:49,173 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
    INFO 00:54:49,173 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 00:54:49,173 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 00:54:49,177 HelpFormatter - Program Args: -nct 1 -T HaplotypeCaller -R /home/sgajja/refData/gatkBundle28/hg19/ucsc.hg19.fasta -I /data/NEXTseq500/RunsOutput/660232_NS500510_0099_WESVAL3/Data/DMC_GSAP_Analysis_1/C248A-1/020_SampleLevel/bwa/SampleReads_Final.bam -stand_emit_conf 10 -stand_call_conf 30 -L /data/NEXTseq500/RunsOutput/660232_NS500510_0099_WESVAL3/Data/TargetCaptureExomePerChr/chr1.bed -ip 100 -o /data/NEXTseq500/RunsOutput/660232_NS500510_0099_WESVAL3/Data/DMC_GSAP_Analysis_1/C248A-1/030_VariantCalls/bwa_hc/perChr_Padded_VCFs/chr1.vcf
    INFO 00:54:49,184 HelpFormatter - Executing as [email protected] on Linux 3.10.0-229.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_85-mockbuild_2015_07_11_12_24-b00.
    INFO 00:54:49,185 HelpFormatter - Date/Time: 2016/02/11 00:54:49
    INFO 00:54:49,185 HelpFormatter - --------------------------------------------------------------------------------
    INFO 00:54:49,185 HelpFormatter - --------------------------------------------------------------------------------
    INFO 00:54:49,387 GenomeAnalysisEngine - Strictness is SILENT
    INFO 00:54:49,710 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
    INFO 00:54:49,729 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 00:54:49,951 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.22
    INFO 00:54:50,008 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
    INFO 00:54:50,793 IntervalUtils - Processing 8610642 bp from intervals
    INFO 00:54:51,107 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 00:54:52,381 GenomeAnalysisEngine - Done preparing for traversal
    INFO 00:54:52,382 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 00:54:52,383 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 00:54:52,383 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
    INFO 00:54:52,397 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
    INFO 00:54:52,471 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
    WARN 00:54:52,471 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
    INFO 00:54:52,472 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
    INFO 00:54:52,643 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    Using AVX accelerated implementation of PairHMM
    INFO 00:54:55,261 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file
    INFO 00:54:55,262 VectorLoglessPairHMM - Using vectorized implementation of PairHMM
    WARN 00:54:55,499 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper
    INFO 00:55:22,466 ProgressMeter - chr1:1581320 1.6768435E7 30.0 s 1.0 s 1.8% 27.3 m 26.8 m
    INFO 00:55:52,468 ProgressMeter - chr1:6947623 1.68183189E8 60.0 s 0.0 s 5.0% 20.1 m 19.1 m
    INFO 00:56:22,469 ProgressMeter - chr1:12907509 5.53633264E8 90.0 s 0.0 s 8.7% 17.2 m 15.7 m
    INFO 00:56:52,471 ProgressMeter - chr1:16862567 8.37525345E8 120.0 s 0.0 s 11.0% 18.2 m 16.2 m
    INFO 00:57:22,473 ProgressMeter - chr1:16902661 8.47986743E8 2.5 m 0.0 s 11.1% 22.6 m 20.1 m
    INFO 00:57:52,474 ProgressMeter - chr1:17085331 8.68094468E8 3.0 m 0.0 s 11.2% 26.8 m 23.8 m
    INFO 00:58:22,476 ProgressMeter - chr1:22413452 1.626225933E9 3.5 m 0.0 s 15.0% 23.3 m 19.8 m
    INFO 00:58:52,477 ProgressMeter - chr1:33475968 3.489287992E9 4.0 m 0.0 s 22.0% 18.2 m 14.2 m
    INFO 00:59:22,478 ProgressMeter - chr1:45292474 6.054156442E9 4.5 m 0.0 s 29.6% 15.2 m 10.7 m
    INFO 00:59:52,479 ProgressMeter - chr1:67515314 1.0560958005E10 5.0 m 0.0 s 37.6% 13.3 m 8.3 m
    INFO 01:00:22,480 ProgressMeter - chr1:94476080 1.4795485716E10 5.5 m 0.0 s 44.0% 12.5 m 7.0 m
    WARN 01:00:59,973 ExactAFCalculator - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at chr1:120612040 has 8 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument. This warning message is output just once per run and further warnings will be suppressed unless the DEBUG logging level is used.
    INFO 01:01:02,481 ProgressMeter - chr1:143767834 2.133763964E10 6.2 m 0.0 s 52.9% 11.7 m 5.5 m
    INFO 01:01:42,482 ProgressMeter - chr1:145281844 2.2163726403E10 6.8 m 0.0 s 53.8% 12.7 m 5.9 m
    INFO 01:02:12,482 ProgressMeter - chr1:148021393 2.4207954031E10 7.3 m 0.0 s 56.1% 13.1 m 5.7 m
    INFO 01:02:42,483 ProgressMeter - chr1:152188414 2.6994423749E10 7.8 m 0.0 s 59.8% 13.1 m 5.3 m
    INFO 01:03:12,484 ProgressMeter - chr1:154574330 2.8583123259E10 8.3 m 0.0 s 62.1% 13.4 m 5.1 m
    INFO 01:03:42,485 ProgressMeter - chr1:162313881 3.4648654903E10 8.8 m 0.0 s 69.8% 12.6 m 3.8 m
    INFO 01:04:22,486 ProgressMeter - chr1:193038532 4.6263640824E10 9.5 m 0.0 s 79.1% 12.0 m 2.5 m
    INFO 01:04:52,487 ProgressMeter - chr1:206648106 5.3850508713E10 10.0 m 0.0 s 85.3% 11.7 m 103.0 s
    INFO 01:05:22,487 ProgressMeter - chr1:227843021 6.3922741982E10 10.5 m 0.0 s 92.4% 11.4 m 51.0 s
    INFO 01:05:52,488 ProgressMeter - chr1:243329412 7.1959666831E10 11.0 m 0.0 s 97.9% 11.2 m 14.0 s
    INFO 01:06:09,171 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.39877650600000003
    INFO 01:06:09,172 PairHMM - Total compute time in PairHMM computeLikelihoods() : 212.286952828
    INFO 01:06:09,172 HaplotypeCaller - Ran local assembly on 7277 active regions
    INFO 01:06:09,246 ProgressMeter - done 7.4578853556E10 11.3 m 0.0 s 100.0% 11.3 m 0.0 s
    INFO 01:06:09,246 ProgressMeter - Total runtime 676.86 secs, 11.28 min, 0.19 hours
    INFO 01:06:09,247 MicroScheduler - 834562 reads were filtered out during the traversal out of approximately 5146274 total reads (16.22%)
    INFO 01:06:09,247 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO 01:06:09,247 MicroScheduler - -> 333266 reads (6.48% of total) failing DuplicateReadFilter
    INFO 01:06:09,247 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
    INFO 01:06:09,248 MicroScheduler - -> 500707 reads (9.73% of total) failing HCMappingQualityFilter
    INFO 01:06:09,248 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 01:06:09,248 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
    INFO 01:06:09,248 MicroScheduler - -> 589 reads (0.01% of total) failing NotPrimaryAlignmentFilter
    INFO 01:06:09,248 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter

    Issue · Github
    by Sheila

    Issue Number
    787
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @mglclinical
    Hi,

    I will make a note to change the default value there.

    The default values do not show up in the log output. They appear in the VCF header. So, if you check the ##GATKCommandLine.HaplotypeCaller line of the header, you will see all the possible arguments for HaplotypeCaller and their values. You will also notice there genotyping_mode=DISCOVERY even when you did not specify it in your command :relaxed:

    I hope this helps.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited April 2016

    Hi folks, the "NA" is due to a bug in the automatic documentation system we use. I'll see if I can fix it.

  • mglclinicalmglclinical USAMember

    @Sheila , Thank you for the clarification and also for pointing me to the ##GATKCommandLine header in vcf file.

Sign In or Register to comment.