Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error: Argument with name 'drf' isn't defined.

qjehanneqjehanne BordeauxMember

Hello,

I'm currently trying to call SNP on several samples (8 bam, 1 pseudoref) and 75.15% of my reads have failed the DuplicateReadFilter. I tried to disable this filter with "-drf DuplicateRead" in the command line but that error came up (drf not defined).

Any idea?

Regards,

Quentin Jehanne

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @qjehanne
    Hi Quentin,

    Can you please post the exact command you ran and tell us the version of GATK you are using?

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    That argument was only introduced in version 3.5 if I recall correctly.

  • qjehanneqjehanne BordeauxMember
    edited April 2016

    @Sheila said:
    @qjehanne
    Hi Quentin,

    Can you please post the exact command you ran and tell us the version of GATK you are using?

    Thanks,
    Sheila

    java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R pseudoref.fa -I [...] -glm BOTH -o res.vcf -drf DuplicateRead --filter_mismatching_base_and_quals -et NO_ET -K file.key

    @Geraldine_VdAuwera said:
    That argument was only introduced in version 3.5 if I recall correctly.

    Any tips to get through this % problem?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    How was the data generated? I mean, what prep method and sequencing technology? If you're working with e.g. amplicon sequencing or RADseq you should just not mark duplicates, or use -drf DuplicateRead to ignore the marking if it's already done. If you're working with exome or genome data then you can't ignore the duplicate marking, and instead you should investigate why you're getting such high levels of duplication in your data. You may need to have a chat with your sequence provider or whoever prepped the samples. If you decide to go ahead and use this data anyway, then there's no workaround -- just allow the program to filter out the duplicates and hope that leaves enough coverage to make useable calls.

  • qjehanneqjehanne BordeauxMember
    edited May 2016

    How was the data generated? I mean, what prep method and sequencing technology? If you're working with e.g. amplicon sequencing or RADseq you should just not mark duplicates, or use -drf DuplicateRead to ignore the marking if it's already done.

    RAD seq data, Ion Torrent sequencing. What I did is:

    • 8 Fastq files obtained (quality filtered)
    • Contaminants removed with Seqtrimnext
    • Fastq to fasta
    • Pseudo-ref created with CLC (from these 8 fasta files)
    • BWA alignment to pseudo ref
    • SAM to BAM / BAM sorted / Indexing with Samtools
    • AddOrReplaceReadGroups / MarkDuplicates with Picard

    And now I'm trying to call SNPs with GATK. So yes, I suppose I've to use -drf DuplicatedRead since It's already marked with Picard but -drf isn't defined.

  • qjehanneqjehanne BordeauxMember

    Thanks for your reply!

    I'll take a look at this as soon as possible :smile:

  • qjehanneqjehanne BordeauxMember

    GATK is now working well!

    Unfortunatly, after 85h processing: "81857322 reads (100.00% of total) failing MalformedReadFilter" :-(

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @qjehanne
    Hi Quentin,

    Ah, I am sorry to hear that! From what we've heard, people who have encountered this problem typically revert the BAMs to FASTQ and redo the alignment from scratch. However, we can't guarantee that this will fix all your issues, and we know that some people just decide to use the software provided by Ion Torrent to make variant calls on this datatype.

    Good luck and let us know what you find! :smiley:

    -Sheila

  • BegaliBegali GermanyMember ✭✭

    hi
    @Geraldine_VdAuwera
    @Sheila

    I would like to ask about MarkDuplicates with Picard which I did not apply in my step before call variants I would to make sure that I did not need it and why we need it because I am little confused and here information which I receive of experimental group .. I am working on RADSeq data for 203 samples

    Genomic DNA has been digested with KpnI, DNA fragments have been size selected and adaptors with barcode ligated to the DNA fragments, which were subsequently sequenced.

    The few samples of Cardamine 5 samples were treated the same, however originate from individual lines, whereas the A.thaliana samples derive from an cross between the two strains Col-0 and 2251 (197 samples)

    Thanks in advance

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Begali Marking duplicates is not recommended for RADSeq data, so you're all clear.

  • BegaliBegali GermanyMember ✭✭
Sign In or Register to comment.