We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK4.0.2.1: steps, interval list, precision of the range (interval)

manolismanolis Member ✭✭✭
edited March 2018 in Ask the GATK team


I'm still a little confuse about intervals. For example, personally I use the -L option in the following steps (-L):

BaseRecalibrator (-L): chr1... chrX, chrY ( =24contigns)
ApplyBQSR #
HaplotypeCaller (-L): 404 intervals*
GenomicsDBImport (-L): 404 intervals*
GenotypeGVCFs (-L): 404 intervals*

*is the whole genome without the gap-regions (reported in the UCSC browser)

I also work with cohort of WES which have different "exon targeted design".

My pipe is reliable/precise using the above intervals? or is better to use the regions/intervals of my "exon targeted designs"? If yes, in which step?

Many thanks

Post edited by manolis on

Best Answers


  • manolismanolis Member ✭✭✭
    edited March 2018

    Thank you SkyW, I edited my question omitted the -R.

    Just a question... in the "GenomicsDBImport" step I created a database composed by 404 intervals. When I apply the "GenotypeGVCFs" step I have to recall each interval from the db, that mean than I can't bypass the -L option and use the entire genome... or not?


    ${ph6} --java-options ${java_opt2x} GenotypeGVCFs -R ${gnm} -O ${pivcf} -D ${sorg01} -G StandardAnnotation --only-output-calls-starting-in-intervals -new-qual -V gendb://${f2} -L "${f1}"

    f1 = interval ; f2 = interval ID in the db

  • manolismanolis Member ✭✭✭
    edited March 2018

    That mean that only for HaplotypeCaller is really necessary to use -L option... for the other steps is only for scatter parallelization. In the GenomicDBImport, if I'm correct, I must have at least 1 interval = 1 sample... 40 samples need at least 40 intervals, processed one interval per time as you told me...

    In the HaplotypeCaller, is there any difference (about false positive variant calls) if I use a large contig (where some regions are covered and other not) or many small regions/intervals full covered? Is it an important different?


  • manolismanolis Member ✭✭✭

    Thank you guys.

Sign In or Register to comment.