Mutect and 'Lexicographically sorted human genome sequence detected in cosmic' error

guitibguitib FranceMember

Hi GATK team and community !

I'm working on a pool of five couple of tumor/normal bam samples and I'm looking for variants (hg19 ref). I did the pre-processing steps successfully and I want to perform the variant calling step with Mutect. I merged tumor samples in a unique merge bam, same for normal alignments.

For mutect I used dbSNP vcf file provide on the broadinstitue ftp dbsnp_138.hg19.vcf and COSMIC vcf file on cosmic ftp /cosmic/grch37/cosmic/v74/CosmicCodingMuts.vcf.gz

First I renamed cosmic contigs according to hg19 reference and reordered it using picard SortVCF tool, ordering vcf file like this :

> awk -F "\t" '{ print $1 }' /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf | grep "^chr" | uniq
chrM
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY

Troubles come when I'm launching mutect :

> mutect --analysis_type MuTect --reference_sequence /home/data/src/broadinstitute/ucsc.hg19.fasta --cosmic /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf --dbsnp /home/data/src/broadinstitute/dbsnp_138.hg19.vcf --input_file:normal workspace/pn_merge/pn_merge_recal_reads.bam --input_file:tumor workspace/pt_merge/pt_merge_recal_reads.bam --out call_stats.out
INFO  19:36:51,977 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:51,979 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-0-g72492bb, Compiled 2015/01/21 17:10:56 
INFO  19:36:51,979 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  19:36:51,979 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  19:36:51,982 HelpFormatter - Program Args: --analysis_type MuTect --reference_sequence /home/data/src/broadinstitute/ucsc.hg19.fasta --cosmic /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf --dbsnp /home/data/src/broadinstitute/dbsnp_138.hg19.vcf --input_file:normal workspace/pn_merge/pn_merge_recal_reads.bam --input_file:tumor workspace/pt_merge/pt_merge_recal_reads.bam --out call_stats.out 
INFO  19:36:51,987 HelpFormatter - Executing as guillaume@Tibioputer on Linux 3.19.0-33-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_85-b01. 
INFO  19:36:51,987 HelpFormatter - Date/Time: 2015/11/22 19:36:51 
INFO  19:36:51,987 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:51,987 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  19:36:52,061 GenomeAnalysisEngine - Strictness is SILENT 
INFO  19:36:52,272 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  19:36:52,282 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  19:36:52,378 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.09 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.1-0-g72492bb): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in cosmic.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
##### ERROR   cosmic contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]
##### ERROR ------------------------------------------------------------------------------------------

If I run the same command without the cosmic file, it's working. What is surprising for me is contigs names and ordering is the same compare to dbSNP vcf file. I don't understand what's going wrong...

I always found topics about my previous issues on GATK forum, but it's like I'm alone to found this problem with a cosmic vcf file, sounds like a minor problem but I can't point it... Any help will be very appreciate :\

thx

Guillaume

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Are you sure the ordering is the same? Normally the _gl contigs are at the end.

  • guitibguitib FranceMember

    Hi @Geraldine_VdAuwera thx for your answer

    @Geraldine_VdAuwera said:
    Are you sure the ordering is the same?

    About ordering, I check it with my previous post with the awk command. Contigs order looks to be is in karyotic order. Contigs order is confirmed on vcf header as follow :

    > awk -F "\t" '{ print $1 }' /home/data/src/cosmic/hg19/CosmicCodingMuts_sorted.vcf | grep "##contig"
    ##contig=<ID=chrM,length=16571>
    ##contig=<ID=chr1,length=249250621>
    ##contig=<ID=chr2,length=243199373>
    ##contig=<ID=chr3,length=198022430>
    ##contig=<ID=chr4,length=191154276>
    ##contig=<ID=chr5,length=180915260>
    ##contig=<ID=chr6,length=171115067>
    ##contig=<ID=chr7,length=159138663>
    ##contig=<ID=chr8,length=146364022>
    ##contig=<ID=chr9,length=141213431>
    ##contig=<ID=chr10,length=135534747>
    ##contig=<ID=chr11,length=135006516>
    ##contig=<ID=chr12,length=133851895>
    ##contig=<ID=chr13,length=115169878>
    ##contig=<ID=chr14,length=107349540>
    ##contig=<ID=chr15,length=102531392>
    ##contig=<ID=chr16,length=90354753>
    ##contig=<ID=chr17,length=81195210>
    ##contig=<ID=chr18,length=78077248>
    ##contig=<ID=chr19,length=59128983>
    ##contig=<ID=chr20,length=63025520>
    ##contig=<ID=chr21,length=48129895>
    ##contig=<ID=chr22,length=51304566>
    ##contig=<ID=chrX,length=155270560>
    ##contig=<ID=chrY,length=59373566>
    ##contig=<ID=chr1_gl000191_random,length=106433>
    ##contig=<ID=chr1_gl000192_random,length=547496>
    ##contig=<ID=chr4_ctg9_hap1,length=590426>
    ##contig=<ID=chr4_gl000193_random,length=189789>
    ##contig=<ID=chr4_gl000194_random,length=191469>
    ##contig=<ID=chr6_apd_hap1,length=4622290>
    ##contig=<ID=chr6_cox_hap2,length=4795371>
    ##contig=<ID=chr6_dbb_hap3,length=4610396>
    ##contig=<ID=chr6_mann_hap4,length=4683263>
    ##contig=<ID=chr6_mcf_hap5,length=4833398>
    ##contig=<ID=chr6_qbl_hap6,length=4611984>
    ##contig=<ID=chr6_ssto_hap7,length=4928567>
    ##contig=<ID=chr7_gl000195_random,length=182896>
    ##contig=<ID=chr8_gl000196_random,length=38914>
    ##contig=<ID=chr8_gl000197_random,length=37175>
    ##contig=<ID=chr9_gl000198_random,length=90085>
    ##contig=<ID=chr9_gl000199_random,length=169874>
    ##contig=<ID=chr9_gl000200_random,length=187035>
    ##contig=<ID=chr9_gl000201_random,length=36148>
    ##contig=<ID=chr11_gl000202_random,length=40103>
    ##contig=<ID=chr17_ctg5_hap1,length=1680828>
    ##contig=<ID=chr17_gl000203_random,length=37498>
    ##contig=<ID=chr17_gl000204_random,length=81310>
    ##contig=<ID=chr17_gl000205_random,length=174588>
    ##contig=<ID=chr17_gl000206_random,length=41001>
    ##contig=<ID=chr18_gl000207_random,length=4262>
    ##contig=<ID=chr19_gl000208_random,length=92689>
    ##contig=<ID=chr19_gl000209_random,length=159169>
    ##contig=<ID=chr21_gl000210_random,length=27682>
    ##contig=<ID=chrUn_gl000211,length=166566>
    ##contig=<ID=chrUn_gl000212,length=186858>
    ##contig=<ID=chrUn_gl000213,length=164239>
    ##contig=<ID=chrUn_gl000214,length=137718>
    ##contig=<ID=chrUn_gl000215,length=172545>
    ##contig=<ID=chrUn_gl000216,length=172294>
    ##contig=<ID=chrUn_gl000217,length=172149>
    ##contig=<ID=chrUn_gl000218,length=161147>
    ##contig=<ID=chrUn_gl000219,length=179198>
    ##contig=<ID=chrUn_gl000220,length=161802>
    ##contig=<ID=chrUn_gl000221,length=155397>
    ##contig=<ID=chrUn_gl000222,length=186861>
    ##contig=<ID=chrUn_gl000223,length=180455>
    ##contig=<ID=chrUn_gl000224,length=179693>
    ##contig=<ID=chrUn_gl000225,length=211173>
    ##contig=<ID=chrUn_gl000226,length=15008>
    ##contig=<ID=chrUn_gl000227,length=128374>
    ##contig=<ID=chrUn_gl000228,length=129120>
    ##contig=<ID=chrUn_gl000229,length=19913>
    ##contig=<ID=chrUn_gl000230,length=43691>
    ##contig=<ID=chrUn_gl000231,length=27386>
    ##contig=<ID=chrUn_gl000232,length=40652>
    ##contig=<ID=chrUn_gl000233,length=45941>
    ##contig=<ID=chrUn_gl000234,length=40531>
    ##contig=<ID=chrUn_gl000235,length=34474>
    ##contig=<ID=chrUn_gl000236,length=41934>
    ##contig=<ID=chrUn_gl000237,length=45867>
    ##contig=<ID=chrUn_gl000238,length=39939>
    ##contig=<ID=chrUn_gl000239,length=33824>
    ##contig=<ID=chrUn_gl000240,length=41933>
    ##contig=<ID=chrUn_gl000241,length=42152>
    ##contig=<ID=chrUn_gl000242,length=43523>
    ##contig=<ID=chrUn_gl000243,length=43341>
    ##contig=<ID=chrUn_gl000244,length=39929>
    ##contig=<ID=chrUn_gl000245,length=36651>
    ##contig=<ID=chrUn_gl000246,length=38154>
    ##contig=<ID=chrUn_gl000247,length=36422>
    ##contig=<ID=chrUn_gl000248,length=39786>
    ##contig=<ID=chrUn_gl000249,length=38502>
    

    @Geraldine_VdAuwera said:
    Normally the _gl contigs are at the end.

    It looks like cosmic contigs vector of the error message don't correspond to my contigs order.> @Geraldine_VdAuwera said:

  • guitibguitib FranceMember

    @Geraldine_VdAuwera said:
    Did you update the index file after editing the file?

    .... Noooope :neutral: Okay I did it (with igvtools), and it work. Minor issue.

    Sorry and thx you

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
Sign In or Register to comment.