We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

MarkDuplicatesSpark argument conflict?

devansdevans Vancouver, BCMember

I am trying to set up a processing pipeline for comparison/backup purposes for a commercial all-in-one fq-to-vcf piece of hardware, so I am reusing some of our data from our original verification with bwa/samtools/gatk.

I have aligned some Illumina reads with bwa mem (0.7.17) which I piped through samtools (1.9) to make a BAM file:
bwa mem -t 40 -T 50 -M -R '@RG\tID:3140_01\tSM:3140_01\tLB:ReadGrpLibrary\tPL:ILLUMINA' /genomes/human/GRCh37/GRCh37-lite.fa 3140_01_S1_R1_001.fastq.gz 3140_01_S1_R2_001.fastq.gz | samtools view -b -h -q 30 -O bam [email protected] 40 -o 3140_01.bam -

This runs without error.

The tooldoc for MarkDuplicatesSpark gives the following usage example:

MarkDuplicatesSpark run locally specifying the removal of sequencing duplicates and tagging OpticalDuplicates

gatk MarkDuplicatesSpark \ -I input.bam \ -O marked_duplicates.bam \ --remove-sequencing-duplicates \ --duplicate-tagging-policy OpticalOnly

But when the following command runs:

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar ./gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar MarkDuplicatesSpark -I 3140_01.bam -O 3140_01_markdup.bam --remove-sequencing-duplicates --duplicate-tagging-policy OpticalOnly --conf spark.executor.cores=40 -M 3140_01_dupmetrics.txt

I get the following error:

A USER ERROR has occurred: Argument 'duplicate-tagging-policy' cannot be used in conjunction with argument(s) remove-sequencing-duplicates

What am I missing? Is that error indicative of a preprocessing error on my part, or is there actually a conflict between those two arguments?

Best Answers

  • devansdevans Vancouver, BC
    Accepted Answer

    Yes, removing duplicates is necessary for our purposes, so that's what we'll do.

    Thanks for the input! Glad to have a resolution to this.

Answers

  • devansdevans Vancouver, BCMember

    An additional piece of info: when I take out --duplicate-tagging-policy OpticalOnly it runs without error and creates a BAM file.

  • devansdevans Vancouver, BCMember
    Accepted Answer

    Yes, removing duplicates is necessary for our purposes, so that's what we'll do.

    Thanks for the input! Glad to have a resolution to this.

Sign In or Register to comment.