Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Why does PathSeqPipeLineSpark not filter all the reads with an AS below the --bwa-score-threshold?

I’m running the PathSeqPipelineSpark on paired-end WGS data with a read length of 2x75. Currently I was inspecting the final BAM-output of PathSeq and noticed that there were reads present in the final BAM-file with an alignment score (AS) of less than 30 (thus below the default filtering threshold). Increasing the ‘--bwa-score-threshold’ to 60 did filter out additional reads, although, again, reads with an AS below 60 were present. Why does PathSeq not filter all reads with an AS below the threshold and is it fine to delete those reads in an additional step myself?

``` #Python

for the AS threshold of 30:

! ./gatk-4.1.2.0/gatk PathSeqPipelineSpark \
--input $BAM-file \
--kmer-file /path/to/pathseq_host.bfi \
--filter-bwa-image /path/to/pathseq_host.fa.img \
--microbe-bwa-image /path/to/pathseq_microbe.fa.img \
--fasta /path/to/pathseq_microbe.fa \
--taxonomy-file /path/to/pathseq_taxonomy.db \
--min-clipped-read-length 60 \
--min-score-identity 0.90 \
--identity-margin 0.02 \
--bwa-score-threshold 60 \
--scores-output $scores_pathseq \
--output $output_pathseq \
--filter-metrics $filter_pathseq \
-- --spark-runner LOCAL \
--spark-master local[20]

Code for the default AS threshold

! ./gatk-4.1.2.0/gatk PathSeqPipelineSpark \
--input $BAM-file \
--kmer-file /path/to/pathseq_host.bfi \
--filter-bwa-image /path/to/pathseq_host.fa.img \
--microbe-bwa-image /path/to/pathseq_microbe.fa.img \
--fasta /path/to/pathseq_microbe.fa \
--taxonomy-file /path/to/pathseq_taxonomy.db \
--min-clipped-read-length 60 \
--min-score-identity 0.90 \
--identity-margin 0.02 \
--scores-output $scores_pathseq \
--output $output_pathseq \
--filter-metrics $filter_pathseq \
-- --spark-runner LOCAL \
--spark-master local[20]

```

Issue · Github
by bhanuGandham

Issue Number
6138
State
open
Last Updated

Answers

Sign In or Register to comment.