Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MuTect2: I need some clarification on the output

alaabadrealaabadre FranceMember
edited April 2017 in Ask the GATK team

Hello everyone,

I have recently produced results from MuTect2. This is my command running on default parameters:

java -jar WES/programs/GATK-3.7.jar -T MuTect2 -I:normal WES/new_samples/1N.dedup.srt.coor.bam -I:tumor WES/new_samples/1A.picard.dedup.csort.bam -R WES/reference/withoutSuperContigs/hg19_ref_genome.fa --dbsnp WES/dbsnp/dbsnp_138.hg19.vcf --cosmic WES/cosmic/CosmicCodingMuts_GHCr37.p13.vcf -L WES/AllExonsIntervals/HumanAllExonV6r2/S07604514_Covered.bed --out WES/mutect2/resultForNewSample/1NA.vcf -log WES/mutect2/resultForNewSample/1NA.log

Fast forward to the result of some samples content:
Sample 1NA:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 13557 . G A . alt_allele_in_normal ECNT=1;HCNT=14;MAX_ED=.;MIN_ED=.;NLOD=139.96;TLOD=23.83 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:889,13:0.018:5:8:0.385:31658,477:438:451
chr1 14677 rs201327123 G A . alt_allele_in_normal;germline_risk DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=3.51;TLOD=8.46 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:90,8:0.079:7:1:0.875:3192,255:51:39
chr1 14792 . G A . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=5;HCNT=8;MAX_ED=223;MIN_ED=23;NLOD=67.30;TLOD=5.16 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:375,9:0.021:4:5:0.444:13126,277:193:182
chr1 14815 . C T . alt_allele_in_normal;clustered_events ECNT=5;HCNT=17;MAX_ED=223;MIN_ED=23;NLOD=72.94;TLOD=7.90 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:458,23:0.042:15:8:0.348:16262,843:234:224
chr1 15015 . G C . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=5;HCNT=11;MAX_ED=223;MIN_ED=23;NLOD=91.68;TLOD=4.08 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:467,11:0.024:4:7:0.364:16614,385:240:227
chr1 16996 . T C . alt_allele_in_normal ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=14.76;TLOD=8.89 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:728,62:0.071:28:34:0.452:25702,2179:369:359

Sample 5NA:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 14741 . C A . alt_allele_in_normal ECNT=1;HCNT=9;MAX_ED=.;MIN_ED=.;NLOD=3.52;TLOD=18.82 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:151,15:0.099:7:8:0.533:5399,502:91:60
chr1 14815 . C T . alt_allele_in_normal;clustered_events ECNT=4;HCNT=18;MAX_ED=200;MIN_ED=98;NLOD=49.01;TLOD=14.48 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:388,16:0.028:13:3:0.188:13726,584:198:190
chr1 15000 . G C . alt_allele_in_normal;clustered_events;t_lod_fstar ECNT=4;HCNT=3;MAX_ED=200;MIN_ED=98;NLOD=83.74;TLOD=4.78 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:474,8:0.019:5:3:0.625:16833,273:231:243
chr1 15015 . G C . alt_allele_in_normal;clustered_events ECNT=4;HCNT=7;MAX_ED=200;MIN_ED=98;NLOD=68.29;TLOD=7.17 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:412,7:0.018:4:3:0.571:14567,232:203:209
chr1 664427 . C A . alt_allele_in_normal;clustered_events ECNT=2;HCNT=14;MAX_ED=61;MIN_ED=61;NLOD=61.18;TLOD=7.12 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:431,12:0.032:7:5:0.417:15188,444:225:206
chr1 664488 . G C . alt_allele_in_normal;clustered_events ECNT=2;HCNT=5;MAX_ED=61;MIN_ED=61;NLOD=80.63;TLOD=6.56 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:588,13:0.028:8:5:0.615:21242,473:302:286

Sample 12NA:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT $line
chr1 13281 . C G . alt_allele_in_normal ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=25.40;TLOD=10.92 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:336,18:0.059:11:7:0.389:11872,639:165:171
chr1 14792 . G A . alt_allele_in_normal;clustered_events ECNT=3;HCNT=28;MAX_ED=223;MIN_ED=23;NLOD=33.07;TLOD=20.97 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:368,16:0.033:7:9:0.438:12845,559:185:183
chr1 14815 . C T . alt_allele_in_normal;clustered_event s ECNT=3;HCNT=28;MAX_ED=223;MIN_ED=23;NLOD=28.34;TLOD=33.46 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:449,42:0.066:22:20:0.476:15948,1532:235:214
chr1 15015 . G C . alt_allele_in_normal;clustered_events ECNT=3;HCNT=23;MAX_ED=223;MIN_ED=23;NLOD=76.09;TLOD=7.09 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:429,9:0.024:6:3:0.667:15288,310:214:215
chr1 664740 . A G . alt_allele_in_normal ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=31.83;TLOD=13.56 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:289,13:0.056:6:7:0.538:10396,473:143:146
chr1 665172 . C T . alt_allele_in_normal;t_lod_fstar ECNT=1;HCNT=32;MAX_ED=.;MIN_ED=.;NLOD=28.50;TLOD=4.15 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1 0/0:241,5:0.015:4:1:0.200:8492,154:115:126

So my questions are:

  1. Is the $line in the output VCF header normal to have ? Are we not supposed to get something like Normal Tumor instead ?

  2. This is the direct output of MuTect2, And the number of total variants for each sample are:
    1NA: 4285
    5NA: 4267
    12NA: 7221
    After reading a bit on the forum, I found out that the variants that have only "PASS" in the filter column are only the accept variants as mutated/indels as the other have been filtered out due to different reasons (alt_allele_in_normal, clustered_events, etc...). I simply run an awk command to filter all the lines that have the "PASS" in the filter column and this is what I found:
    1NA: 1
    5NA: 1
    12NA: 1
    Logically speaking, this is not normal to have only one variant accepted for each sample. Is there any post-process of the VCF files that needs to be done and I am not aware of ?

  3. I would like to have a better explanation of what is HCNT please. I tried to look over the forum if there's any clear explanation given but I could not find any. I saw that in the VCF header it says: "Number of haplotypes that support this variant". Any picture or explanation would be greatly appreciated.

Thank you in advance for your time.
Alaa

Post edited by alaabadre on
Tagged:

Best Answers

Answers

  • alaabadrealaabadre FranceMember

    Please ? Someone who can help me ? :(

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alaabadre
    Hi Alaa,

    Sorry for the delay.

    1) Indeed there should be tumor and normal. Can you please post the BAM headers here. Specifically, I am looking for the read group information.

    2) This is very unexpected and could indicate bad quality data. Have you done any quality control on your data? Can you please post some IGV screenshots of the site that did PASS and a site that did not pass?

    3) Have a look at this document. It is relevant to MuTect2 as well as HaplotypeCaller.

    -Sheila

  • alaabadrealaabadre FranceMember
    edited April 2017

    @Sheila said:
    @alaabadre
    Hi Alaa,

    Sorry for the delay.

    1) Indeed there should be tumor and normal. Can you please post the BAM headers here. Specifically, I am looking for the read group information.

    2) This is very unexpected and could indicate bad quality data. Have you done any quality control on your data? Can you please post some IGV screenshots of the site that did PASS and a site that did not pass?

    3) Have a look at this document. It is relevant to MuTect2 as well as HaplotypeCaller.

    -Sheila

    Good morning Sheila,

    Thanks for your reply !

    1. Please see the following BAM header for one of the samples:
      @HD VN:1.3 SO:coordinate
      @SQ SN:chrM LN:16571
      @SQ SN:chr1 LN:249250621
      @SQ SN:chr2 LN:243199373
      @SQ SN:chr3 LN:198022430
      @SQ SN:chr4 LN:191154276
      @SQ SN:chr5 LN:180915260
      @SQ SN:chr6 LN:171115067
      @SQ SN:chr7 LN:159138663
      @SQ SN:chr8 LN:146364022
      @SQ SN:chr9 LN:141213431
      @SQ SN:chr10 LN:135534747
      @SQ SN:chr11 LN:135006516
      @SQ SN:chr12 LN:133851895
      @SQ SN:chr13 LN:115169878
      @SQ SN:chr14 LN:107349540
      @SQ SN:chr15 LN:102531392
      @SQ SN:chr16 LN:90354753
      @SQ SN:chr17 LN:81195210
      @SQ SN:chr18 LN:78077248
      @SQ SN:chr19 LN:59128983
      @SQ SN:chr20 LN:63025520
      @SQ SN:chr21 LN:48129895
      @SQ SN:chr22 LN:51304566
      @SQ SN:chrX LN:155270560
      @SQ SN:chrY LN:59373566
      @RG ID:NG8653 SM:$line
      @PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:/labo6/gronemeyer1/common/Ashick_backup/softwares/bwa-0.7.12/bwa mem -t 10 -R @RG\tID:NG8653\tSM:$line -M /labo6/gronemeyer1/common/Ashick_backup/genomes/hg/hg19_bwa_7.10_index NG-8653_1A_lib95899_4332_7_1.fastq NG-8653_1A_lib95899_4332_7_2.fastqFFFFFFFFFF NM:i:1 MD:Z:63C39 AS:i:98 XS:i:79 RG:Z:NG8653 XA:Z:chr17,+22020704,124M2S,9;
      If I guessed correctly, the problem is right here in the header of the BAM files in the last @RG line. Normally this should be tumor. This is also present in the other normal sample. So this is why MuTect2 can't differentiate between the two of them. Should we use samtools to edit the header or is there a better way, Sheila ?

    2. I have not done any quality control over the data except that we were told that the received data are good. So we did not bother to do any quality control over them. Please see in the attachments for some IGV screenshots.

    3. Thank you for the provided link ! I will read it in depth.

    Again, Thank you for your support Sheila !
    Best regards,
    Alaa

  • alaabadrealaabadre FranceMember

    Hello Sheila !

    Thank you for your reply.

    I will get back to you shortly after I make the necessary modifications. I just wanted to ask you if you managed to see any interesting differences between the IGV screenshots that I provided.

    Thanks Sheila and Happy Easter to all GATK Team <3

    Best regards,
    Alaa

  • mmokrejsmmokrejs Czech RepublicMember
    edited April 2017

    @alaabadre
    BTW, you would do better if aligning your reads to hg38dh in ALT-aware mode (GATK has nice documents for ALT-aware mapping), which requires recent enough bwa and some extra files. Anyway, even for hg19-based mapping 0.7.12 is way too old. I have 0.7.15-r1140. You should take to to properly tag other fields, like LB, PU. You could have ruined some MAPQ values due to that in in IndelRealigner and BQSR steps I guess.

  • alaabadrealaabadre FranceMember

    @mmokrejs said:
    @alaabadre
    BTW, you would do better if aligning your reads to hg38dh in ALT-aware mode (GATK has nice documents for ALT-aware mapping), which requires recent enough bwa and some extra files. Anyway, even for hg19-based mapping 0.7.12 is way too old. I have 0.7.15-r1140. You should take to to properly tag other fields, like LB, PU. You could have ruined some MAPQ values due to that in in IndelRealigner and BQSR steps I guess.

    Hello @mmokrejs !

    Thank you for your suggestion ! Please check out my latest thread here. I have corrected the problem and I will also check the ALT-aware mode. I am also using the latest version of BWA which is 0.7.15-r1142. Apparently, the BAM files were missing read groups. I did a recalibration step for my BAM files and everthing's fine now.
    Thank you again !

Sign In or Register to comment.