-L option with PrintReads

nahmednahmed TU DelftMember
edited March 5 in Ask the GATK team

I am using GATK-3.6 to analyze exome sequencing data. To speed up the analysis I split the BED file provided in GATK resource bundle (Broad.human.exome.b37.bed) into several files, one for each chromosome. I created the BQSR tables separately (in parallel) for each chromosome using the -L option. Under theses circumstances can I use the -L option with PrintReads to run on each BQSR table separately as well. It is not recommended in The input to PrintReads file is a single deduplicated BAM file. Indel realignment is not performed

  • nahmednahmed TU DelftMember
    edited March 6

    Hi @Geraldine_VdAuwera,
    Thank you for the answer. Is my approach OK for GATK3? As described above the input is a single deduplicated BAM file. Output is multiple BAMs, one per table.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited March 9


    Assuming you have enough data per chromosome to run BQSR, you can use -L with BaseRecalibrator then use PrintReads without -L. The reason is that you don't want to lose any data with -L when you output the final BAM file. BaseRecalibrator will only run on the -L intervals, but PrintReads with -bqsr will recalibrate all reads/bases and output any that are not included in -L.

    I hope this makes sense.


    EDIT: Ideally, you want to run the whole process without -L, so the tools have enough data to produce proper models.

