We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Mutect and ampliseq IonTorrent TES

Hi,

I am trying to use Mutect (v1.1.4) with ultra-deep targeted ampliseq IonProton data.
The coverage is around ~7000x and here are the mutect parameters I use:

-rf BadCigar --analysis_type MuTect --reference_sequence $REF --cosmic $COSMIC --dbsnp $DBSNP --input_file:normal $1 --input_file:tumor $2 --out $2.mutect

now, I have the following problems:

  1. despite the very high coverage, only literally a handful of mutations are flagged as KEEP
  2. if the filtering is relaxed, and I select all the NOVEL+COVERED mutations that are not in dbsnp and with a coverage of at least 400x (columns 21 and 22) and at least 40 variants (column 22), the coverage gets "adjusted" to 1000 (columns 21 and 22 always sum up to at most 1000, despite having many >7000 covered spots).
  3. also, with the latest filtering step, I remain with a lot of low-frequency probably spurious variants that are not present in the original exome run (also analyzed with mutect)
  4. I looked at the overlap between mutect and varscan2 and it is very poor (<40%)

I was wondering if there is an optimal way to use mutect with ampliseq data avoiding all these problems.

Thanks,
Andrea

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Andrea,

    There is an automatic downsampling applied to adjust coverage to max 1000X to limit the computational requirements of the program. You can increase this using the -dcov argument (see GATK engine documentation for details). Generally we consider that very high depth is more a problem than a benefit. Perhaps that is not the case for ampliseq data, but I'm not familiar enough with that data type to comment on how well MuTect should do on that datatype. I'll ask the developer, @kcibul‌, to comment.

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    Thanks Geraldine! In addition to all she said, extremely high coverage data is very challenging to work with. The basic reason is that your gaining power to find low level artifacts AND mutations. The former has greater numbers and so you're being swamped! Like turning your volume up
    On the radio to get a distant station.

    You might want to try thresholding on allele fraction (tumor_f) and see if that helps.

    Good luck, let's us know what you find!

Sign In or Register to comment.