Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Mutect2 bamout depths not matching vcf.

sxvsxv californiaMember

Hi,

After running Mutect2 on tumor/normal paired bam files, I get an output VCF with unusually high depth counts. I understand that the numbers here can differ from the input bam depths due to genotype reassembly within Mutect2. However, these depths are sometimes jumping from around 20 reads to 200 reads, which seems hard to believe. In order to investigate further, I re-ran Mutect2 with the -bamout option, in order to analyze some positions in IGV. The Mutect2 command is posted below. The problem is that when I look at the output bam (bamout) and the output VCF in IGV, the numbers do not match. Note: I ran FilterMutectCalls and selected only PASS calls when deciding which positions to look at. An example position: depth 146 and 192 in tumor and normal, respectively in the VCF, but only 41 reads for the same position in the bamout file.

Questions:
1) What can explain the discrepancy between the bamout depths and the vcf depths?
2) Is there a way to get a bam/bamout that matches the VCF exactly?

Thanks a lot,
Sujay

Mutect2 command:
./gatk-4.0.10.1/gatk --java-options "-Xmx4g" Mutect2 -R /genomes/Hsapiens/hg19/seq/hg19.fa --annotation ClippingRankSumTest --annotation DepthPerSampleHC --annotation MappingQualityRankSumTest --annotation MappingQualityZero --annotation QualByDepth --annotation ReadPosRankSumTest --annotation RMSMappingQuality --annotation FisherStrand --annotation MappingQuality --annotation DepthPerAlleleBySample --annotation Coverage --read-validation-stringency LENIENT -I tumorX.markduplicates.grouped.bam -tumor tumorX -I normalX.markduplicates.grouped.bam -normalX normal -L ./baits.bed --interval-set-rule INTERSECTION --disable-read-filter NotDuplicateReadFilter -ploidy 2 -bamout tumorX.bamout.bam -O tumorX.vcf

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @sxv

    Would you please share the vcf record for that position and also a snapshot of the IGV for that region. This will help us evaluate this issue.
    Thank you.

    Regards
    Bhanu

  • sxvsxv californiaMember
    edited October 26

    HI Bhanu,

    VCF record is:

    chr1 11167123 . C T . PASS ClippingRankSum=3.937;DP=340;ECNT=1;FS=76.367;MQ=59.97;MQ0=0;MQRankSum=0.596;NLOD=57.71;N_ART_LOD=-2.287e+00;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-6.061e+01;ReadPosRankSum=0.159;TLOD=172.17 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:96,50:0.343:146:51,10:45,40:34,41:223,185:60:36:0.293,0.343,0.342:0.026,0.025,0.949 0/0:192,0:6.199e-03:192:70,0:122,0:39,0:167,0:0:0

    And the IGV screenshots:

    http://cl.ly/31d93e94ac26/Image 2018-10-25 at 11.16.53 PM.png (vcf - normal)
    http://cl.ly/63fa205bf0d6/Image 2018-10-25 at 11.19.19 PM.png (vcf - tumor)
    http://cl.ly/fa39d15f2723/Image 2018-10-25 at 11.17.24 PM.png (bam - tumor)

    Let me know if I can provide any additional data / information to help resolve this. Thanks!

    sujay

  • tsato3tsato3 Cambridge, MAMember, Broadie, Dev

    Hi Sujay,

    My name is Takuto and I'm a developer on the Mutect team. This is a curious case - could you send us a snippet of the tumor and normal bams? You can create a bam snippet with PrintReads with the vcf (a subset of the vcf will do) as the -L argument. We can help you with it if needed - please let us know.

    Best,
    Takuto

  • sxvsxv californiaMember
    edited October 28

    Hi Takuto,

    I created a VCF with only three calls and used this to create a BAM a snippet.

    BAM snippet (<2mb)

    VCF calls:
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT dcis2 normal chr1 11167123 . C T . PASS ClippingRankSum=3.937;DP=340;ECNT=1;FS=76.367;MQ=59.97;MQ0=0;MQRankSum=0.596;NLOD=57.71;N_ART_LOD=-2.287e+00;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-6.061e+01;ReadPosRankSum=0.159;TLOD=172.17 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:96,50:0.343:146:51,10:45,40:34,41:223,185:60:36:0.293,0.343,0.342:0.026,0.025,0.949 0/0:192,0:6.199e-03:192:70,0:122,0:39,0:167,0:0:0 chr1 11174877 . G A . PASS ClippingRankSum=-1.020;DP=302;ECNT=1;FS=10.970;MQ=58.88;MQ0=0;MQRankSum=0.945;NLOD=61.01;N_ART_LOD=-2.309e+00;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-8.163e+01;ReadPosRankSum=2.309;TLOD=9.50 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:93,5:0.051:98:70,0:23,5:37,41:225,158:60:40:0.00,0.051,0.051:0.017,6.168e-03,0.977 0/0:203,0:1.104e-03:203:106,0:97,0:37,0:148,0:0:0 chr1 11175658 . C T . PASS ClippingRankSum=2.734;DP=348;ECNT=2;FS=0.000;MQ=56.81;MQ0=0;MQRankSum=1.752;NLOD=55.25;N_ART_LOD=-2.313e+00;POP_AF=1.000e-06;P_CONTAM=0.00;P_GERMLINE=-8.170e+01;ReadPosRankSum=0.610;TLOD=7.25 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:110,4:0.043:114:60,0:50,4:37,35:216,250:60:22:0.030,0.030,0.035:0.018,4.089e-03,0.978 0/0:184,0:0.108:184:64,0:120,0:35,0:190,0:0:0

    Let me know if this works and if I can provide anything else to help. I appreciate your help.

    Thanks,
    Sujay

  • sxvsxv californiaMember

    @tsato3 Any ideas? Wondering if I should revert to GATK3 pipeline for now.

  • tsato3tsato3 Cambridge, MAMember, Broadie, Dev

    Hi sxv,

    Thanks for sharing the bam snippet. I haven't been able to look at it yet - I hope to review it by the end of the week.

    Takuto

  • tsato3tsato3 Cambridge, MAMember, Broadie, Dev

    Hi Sujay,

    Seems like you're using a different flavor of the hg19 reference (UCSC) than we do. It'll take a while to set up the correct reference. I'll take a look at the output later this afternoon or Monday.

    Takuto

  • tsato3tsato3 Cambridge, MAMember, Broadie, Dev

    Hi Sujay,

    I dug into this a bit but could not reproduce the issue (the bamout containing significantly fewer reads than reported in vcf). None of the mutations were called with the matched normal mode.

    I can look through the code a little more tomorrow and try to come up with a hypothesis as to why vcf and bamout might disagree in the way you're seeing.

    Takuto

  • xiuczxiucz Member
    edited December 3

    hi, @tsato3 ,
    Any more news now ? Thank you!

  • davidbendavidben BostonMember, Broadie, Dev ✭✭

    @sxv This is a strange case but I think Mutect2 is working as intended. There are a couple of subtleties.

    • You're allowing duplicate reads to go through with --disable-read-filter NotDuplicateReadFilter. The depth at chr1:11167123 (in the original bam, that is) is 10 tumor reads and 10 normal reads if you exclude duplicates, as one normally does. Note that IGV by default doesn't show duplicates, which might explain a lot of the discrepancy. You can view duplicates by going to View->Preferences->Alignments. When I do this and run your command line the VCF DPs and ADs match the bamout read counts.
    • Mutect2, like HaplotypeCaller, downsamples reads by default if they exceed a very high coverage. Your bam's depth, including duplicates, is about 5000, which would normally only trigger a bit of downsampling. However, Mutect2 and HaplotypeCaller don't downsample to a desired coverage per se. Rather, they downsample all reads that share a given start position. Since your data consist of a few primary reads each with a few hundred duplicates, your coverage by start position is not uniform, and so downsampling has a huge effect. If you want to retain downsampling but want to overcome the unevenness of your start positions, you can use the --stride argument to downsample over a range of starts. For example --max-reads-per-alignment-start 20 --stride 30 downsamples to 600 = 20 x 30 reads starting in every window of 30 bases.

    To summarize: 1) the bamout and vcf agree when you tell IGV to show duplicates; 2) if you really want to keep duplicates you can fix the unevenness of depth with the -stride argument.

    Out of curiosity, why are you disabling the duplicate read filter, and why are you running Mutect2 with HaplotypeCaller annotations? I'm not saying you're wrong; I just want to understand what seems to be a non-standard use case.

Sign In or Register to comment.