Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

-L option with PrintReads

nahmednahmed TU DelftMember
edited March 2018 in Ask the GATK team

I am using GATK-3.6 to analyze exome sequencing data. To speed up the analysis I split the BED file provided in GATK resource bundle (Broad.human.exome.b37.bed) into several files, one for each chromosome. I created the BQSR tables separately (in parallel) for each chromosome using the -L option. Under theses circumstances can I use the -L option with PrintReads to run on each BQSR table separately as well. It is not recommended in http://gatkforums.broadinstitute.org/gatk/discussion/4133/when-should-i-use-l-to-pass-in-a-list-of-intervals. The input to PrintReads file is a single deduplicated BAM file. Indel realignment is not performed

Best Answer

Answers

  • nahmednahmed TU DelftMember
    edited March 2018

    Hi @Geraldine_VdAuwera,
    Thank you for the answer. Is my approach OK for GATK3? As described above the input is a single deduplicated BAM file. Output is multiple BAMs, one per table.

    Post edited by nahmed on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited March 2018

    @nahmed
    Hi,

    Assuming you have enough data per chromosome to run BQSR, you can use -L with BaseRecalibrator then use PrintReads without -L. The reason is that you don't want to lose any data with -L when you output the final BAM file. BaseRecalibrator will only run on the -L intervals, but PrintReads with -bqsr will recalibrate all reads/bases and output any that are not included in -L.

    I hope this makes sense.

    -Sheila

    EDIT: Ideally, you want to run the whole process without -L, so the tools have enough data to produce proper models.

Sign In or Register to comment.