Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Mutect and ampliseq IonTorrent TES

Hi,

I am trying to use Mutect (v1.1.4) with ultra-deep targeted ampliseq IonProton data.
The coverage is around ~7000x and here are the mutect parameters I use:

-rf BadCigar --analysis_type MuTect --reference_sequence $REF --cosmic $COSMIC --dbsnp $DBSNP --input_file:normal $1 --input_file:tumor $2 --out $2.mutect

now, I have the following problems:

  1. despite the very high coverage, only literally a handful of mutations are flagged as KEEP
  2. if the filtering is relaxed, and I select all the NOVEL+COVERED mutations that are not in dbsnp and with a coverage of at least 400x (columns 21 and 22) and at least 40 variants (column 22), the coverage gets "adjusted" to 1000 (columns 21 and 22 always sum up to at most 1000, despite having many >7000 covered spots).
  3. also, with the latest filtering step, I remain with a lot of low-frequency probably spurious variants that are not present in the original exome run (also analyzed with mutect)
  4. I looked at the overlap between mutect and varscan2 and it is very poor (<40%)

I was wondering if there is an optimal way to use mutect with ampliseq data avoiding all these problems.

Thanks,
Andrea

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Andrea,

    There is an automatic downsampling applied to adjust coverage to max 1000X to limit the computational requirements of the program. You can increase this using the -dcov argument (see GATK engine documentation for details). Generally we consider that very high depth is more a problem than a benefit. Perhaps that is not the case for ampliseq data, but I'm not familiar enough with that data type to comment on how well MuTect should do on that datatype. I'll ask the developer, @kcibul‌, to comment.

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    Thanks Geraldine! In addition to all she said, extremely high coverage data is very challenging to work with. The basic reason is that your gaining power to find low level artifacts AND mutations. The former has greater numbers and so you're being swamped! Like turning your volume up
    On the radio to get a distant station.

    You might want to try thresholding on allele fraction (tumor_f) and see if that helps.

    Good luck, let's us know what you find!

Sign In or Register to comment.