Mutect for amplicon based sequencing

mouradovmouradov AustraliaMember
edited June 2015 in MuTect v1

Hi, I am running deep targeted sequencing (1000x+) using amplicon based technology on tumour normal pairs and am seeing strange ref and alt allele counts. I am just wondering if Mutect does any sort of internal duplicated reads removal or any other filtering that might be problematic for data with high percentage of duplicates? Specifically I am seeing low alt_count numbers for a lot of variants. For example, for a certain variant visual IGV inspection and GATK HaplotypeCaller give > 30% of alleles being ALT (with over 100 ALT alleles reported), Mutect reports 2 ALT alleles. The same bam file was used in both instances.

Happy to post some vcf data if needed.

Thanks in advance

muTect-1.1.4:
--analysis_type MuTect --reference_sequence ucsc.hg19.fasta --cosmic CosmicCodingMuts_v68_fixed_sorted.vcf --intervalsregions.bed --input_file:normal.bam --input_file:tumor.bam -vcf f.name.tum.vcf -o f.name.tum,.txt --enable_extended_output --pir_median_threshold 0 --strand_artifact_power_threshold 0

GATK-3.2-2 Haplotype:
-T HaplotypeCaller -R ucsc.hg19.fasta -I tumor.bam --genotyping_mode DISCOVERY --dbsnp dbsnp_138.hg19.vcf -dcov 1000 -L regions.bed -stand_emit_conf 20 -stand_call_conf 30 -o raw_variants.vcf

Post edited by mouradov on

Answers

  • green4swgreen4sw BostonMember

    Any comments on this?

    I have seen the same problem. In the target sequencing data, although our R1 reads have reandom starting site, our R2 reads always have the same start and end. If a mutation seen on many R2 reads, will muTect consider them as duplicates and counts it as one?

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there, did you run mark duplicates on this data? If not, then MuTect does not consider any reads as duplicates and uses them all. If yes, then it ignores whatever reads were tagged as duplicates by Picard MarkDuplicates.

  • amitmamitm Manchester, UKMember

    hi @mouradov,
    Were you able to sort your issue. I am also analyzing amplicon seq. data and have run both VarScan and MuTect. Both are reporting read-depth that is wildly different from what I see in IGV; i.e. much lesser read-depth.

    ** I have NOT marked duplicates and used BWA for alignment and GATK post-processing.
    But, importantly, the ratio of alt. allele to ref. allele reported by MuTect is more or less similar to what I see in IGV.
    e.g. On IGV I see -
    Ref. allele - ~24,900
    Alt. allele - ~17,200 (~41%)

    MuTect says -
    Ref. - 584
    Alt. - 405 (41%)

    VarScan is going bizarre -
    Ref. - 10
    Alt. - 1733 (~99%)

    @Geraldine_VdAuwera Can you plz. shed light on what other ways could MuTect be filtering reads. I am posting the final lines of the STDOUT for MuTect run -

    INFO 12:41:34,909 MuTect - VERSION INFO: MuTect:1.1.6-4-g69b7a37 Gatk:3.1-0-g72492bb INFO 12:42:04,911 ProgressMeter - 2:198267448 3.52e+02 30.0 s 23.7 h 18.5% 2.7 m 2.2 m INFO 12:42:09,369 MuTect - [MUTECT] Processed 1000910 reads in 8619 ms INFO 12:42:12,169 Walker - [REDUCE RESULT] Traversal result is: 0 INFO 12:42:12,536 ProgressMeter - done 1.69e+03 37.0 s 6.2 h 99.8% 37.0 s 0.0 s INFO 12:42:12,537 ProgressMeter - Total runtime 37.63 secs, 0.63 min, 0.01 hours INFO 12:42:12,538 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 947128 total reads (0.00%) INFO 12:42:12,538 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 12:42:12,538 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 12:42:12,539 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 12:42:12,539 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 12:42:12,540 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
    Any insight would be much help.

    Thanks

  • I finally went with Freebayes.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @amitm MuTect applies GATK's read downsampling function in order to use only the amount of data that it needs and not more, which speeds up processing. The downsampling is done randomly so you end up with the same proportions of alt vs ref alleles.

  • amitmamitm Manchester, UKMember

    @Geraldine_VdAuwera Many thanks for the reply. Given VarScan's discrepancy I would have liked to use MuTect but am limited by lack of paired germline for most of my amplicon seq. samples. I would test HaplotypeCaller for these and Freebayes too. Thanks @patidarr for suggesting.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
  • mouradovmouradov AustraliaMember

    @amitm

    Seems my problem stemmed from a sequencing run that resulted in highly variable quality scores, which played havoc with various cutoffs.

  • amitmamitm Manchester, UKMember

    Ah! Very useful link @Geraldine_VdAuwera . In fact I did ran mutect by removing the parameter for 'normal' and received OK results. Just that the VCF's last col. (designated for the 'normal') had '0' for all tag:pair values. I thought this might not be 'stable' way for MuTect.
    2 179434179 . A G . REJECT . GT:AD:BQ:DP:FA 0/1:0,2:22:2:1.00 0:0,0:.:0:0.00 2 179434201 . T C . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:874,101:19:975:0.104:2 0:0,0:.:0:0.00:0 2 198267468 . C A . REJECT . GT:AD:BQ:DP:FA 0/1:7,4:18:13:0.364 0:0,0:.:0:0.00 2 198267483 . C T . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:915,70:25:985:0.071:2 0:0,0:.:0:0.00:0
    Glad to know that MuTect is stable without a 'normal'. Though the post mentions a technical 'normal sample' would be better. Thanks again. Allows me a scope to retain the same algo. for unpaired samples/ amplicon. data.

  • amitmamitm Manchester, UKMember

    @mouradov Thanks for your reply. I too suspect the same. Trying to build quality distrib. plot from mpileup. Would post again if I am able to pin-point something relevant.

  • @Geraldine_VdAuwera said:
    Hi there, did you run mark duplicates on this data? If not, then MuTect does not consider any reads as duplicates and uses them all. If yes, then it ignores whatever reads were tagged as duplicates by Picard MarkDuplicates.

    Hi Geraldine, Could you just confirm that Mutect is automatically ignoring any read marked as a duplicate ? I have read the code (Mutect v1) and did not see such a filter. Is the GATK engine doing this by default ?

    Thanks
    Anthony

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Yes, the engine does this by default.

Sign In or Register to comment.