Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

problem using -L and -XL option in PrintReads

Dear GATK team,

I am using the -L and -XL options together in my PrintReads command to only print reads in regions of interest and to exclude reads in Blacklisted regions.

Below is an example command:

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R GRCh37-lite.fa -L 1:1-61913548 -XL all_Enhancers.intervals -XL Blacklist_merged.intervals -I input.bam -nct 8 -BQSR recal_data.wg.table -o output.bam

However, I've noticed that the output.bam has reads that are in intervals of my Blacklist file. For example,

HWI-ST1133:217:D1D4WACXX:8:1206:15633:92032 163 1 19022 10 101M = 19168 247 TCCCCAGACATCCCTGTGGCTGGCTCCTGATGCCCGAGGCCCAAGTGTCTGATGCTTTAAGGCACATCACCCCACTCATGCTTTTCCATGTTCTTTGGCCC 249<ADCDEEEDAADEDABDCEBDDEBEBDFBEBB4B:@CDBCA?FFHFE>DGGDGBBCCDCEFDEE?GEFFGGCHB=EAHIDJJ?M=GAHEGHKD<=@D# X0:i:7 X1:i:0 MC:Z:101M BD:Z:IIMJKNOLLHLLLILMHGMMMMMMMJLLMMKNNNJKLMNNNJNJLNHINKNNLNNNJCKKLNNNHIMMLHNKKOINLMNOOOKDDMOPOPKNONNGRNNNI MD:Z:100G0 RG:Z:0.2 XG:i:0 BI:Z:LLOLLQQMOKOONJNOJINNPPOOPKNOPPOPPOKMONOPPLQMOQKLPNQQPQQQMFOMOOPPKMQQPLPMMRLQMPQRRRNGHQQSRSOPRQPJTPPPL AM:i:0 NM:i:1 SM:i:0 XM:i:1 XO:i:0 MQ:i:18 XT:A:R

My Blacklist file includes the interval: 1:18906-19049, so I thought the above read would not show up in my output .bam.

Do you know where I've made a mistake, and is there a better way to exclude reads in Blacklist regions?

Thanks a lot for your help!

Best Answer

Answers

Sign In or Register to comment.