Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

-L option with PrintReads

nahmednahmed TU DelftMember
edited March 2018 in Ask the GATK team

I am using GATK-3.6 to analyze exome sequencing data. To speed up the analysis I split the BED file provided in GATK resource bundle (Broad.human.exome.b37.bed) into several files, one for each chromosome. I created the BQSR tables separately (in parallel) for each chromosome using the -L option. Under theses circumstances can I use the -L option with PrintReads to run on each BQSR table separately as well. It is not recommended in http://gatkforums.broadinstitute.org/gatk/discussion/4133/when-should-i-use-l-to-pass-in-a-list-of-intervals. The input to PrintReads file is a single deduplicated BAM file. Indel realignment is not performed

Best Answer

Answers

  • nahmednahmed TU DelftMember
    edited March 2018

    Hi @Geraldine_VdAuwera,
    Thank you for the answer. Is my approach OK for GATK3? As described above the input is a single deduplicated BAM file. Output is multiple BAMs, one per table.

    Post edited by nahmed on
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited March 2018

    @nahmed
    Hi,

    Assuming you have enough data per chromosome to run BQSR, you can use -L with BaseRecalibrator then use PrintReads without -L. The reason is that you don't want to lose any data with -L when you output the final BAM file. BaseRecalibrator will only run on the -L intervals, but PrintReads with -bqsr will recalibrate all reads/bases and output any that are not included in -L.

    I hope this makes sense.

    -Sheila

    EDIT: Ideally, you want to run the whole process without -L, so the tools have enough data to produce proper models.

Sign In or Register to comment.