BaseRecalibrator: contig order in knownSites and referenceis not the same

Hi.
I have the following problem using BaseRecalibrator:

`INFO 11:29:34,714 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:29:34,717 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 11:29:34,717 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:29:34,717 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:29:34,721 HelpFormatter - Program Args: -T BaseRecalibrator -R /gporq1_1M/usr/aprea/bio/exome_analysys_test/ref/GRCh38.p2/GRCh38.p2.fa -I ID36155FROM31976_S1.merge.sorted.realigned.bam -knownSites /gporq1_1M/usr/aprea/bio/exome_analysys_test/dbsnp/All_20160407.sorted.vcf -o ID36155FROM31976_S1.recal_data.table
INFO 11:29:34,732 HelpFormatter - Executing as [email protected] on Linux 2.6.32-220.7.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18.
INFO 11:29:34,732 HelpFormatter - Date/Time: 2016/05/16 11:29:34
INFO 11:29:34,733 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:29:34,733 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:29:35,894 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:29:36,151 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 11:29:36,170 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:29:36,271 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.10
INFO 11:29:41,989 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Input files knownSites and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information. Error details: The contig order in knownSites and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..
ERROR knownSites contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]
ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrM, chrX, chrY]
ERROR ------------------------------------------------------------------------------------------`

The error message says "contig order in knownSites and referenceis not the same" and also shows contigs order in both reference and knownSites:

knownSites contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrM, chrX, chrY]

But that's what I get from looking at those two files:

> awk ' /^[^#]/ {print $1}' All_20160407.sorted.vcf |uniq chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrM chrX chrY

and

`> egrep "^>" GRCh38.p2.fa

chr1 gi|568815597|ref|NC_000001.11| Homo sapiens chromosome 1, GRCh38.p2 Primary Assembly
chr2 gi|568815596|ref|NC_000002.12| Homo sapiens chromosome 2, GRCh38.p2 Primary Assembly
chr3 gi|568815595|ref|NC_000003.12| Homo sapiens chromosome 3, GRCh38.p2 Primary Assembly
chr4 gi|568815594|ref|NC_000004.12| Homo sapiens chromosome 4, GRCh38.p2 Primary Assembly
chr5 gi|568815593|ref|NC_000005.10| Homo sapiens chromosome 5, GRCh38.p2 Primary Assembly
chr6 gi|568815592|ref|NC_000006.12| Homo sapiens chromosome 6, GRCh38.p2 Primary Assembly
chr7 gi|568815591|ref|NC_000007.14| Homo sapiens chromosome 7, GRCh38.p2 Primary Assembly
chr8 gi|568815590|ref|NC_000008.11| Homo sapiens chromosome 8, GRCh38.p2 Primary Assembly
chr9 gi|568815589|ref|NC_000009.12| Homo sapiens chromosome 9, GRCh38.p2 Primary Assembly
chr10 gi|568815588|ref|NC_000010.11| Homo sapiens chromosome 10, GRCh38.p2 Primary Assembly
chr11 gi|568815587|ref|NC_000011.10| Homo sapiens chromosome 11, GRCh38.p2 Primary Assembly
chr12 gi|568815586|ref|NC_000012.12| Homo sapiens chromosome 12, GRCh38.p2 Primary Assembly
chr13 gi|568815585|ref|NC_000013.11| Homo sapiens chromosome 13, GRCh38.p2 Primary Assembly
chr14 gi|568815584|ref|NC_000014.9| Homo sapiens chromosome 14, GRCh38.p2 Primary Assembly
chr15 gi|568815583|ref|NC_000015.10| Homo sapiens chromosome 15, GRCh38.p2 Primary Assembly
chr16 gi|568815582|ref|NC_000016.10| Homo sapiens chromosome 16, GRCh38.p2 Primary Assembly
chr17 gi|568815581|ref|NC_000017.11| Homo sapiens chromosome 17, GRCh38.p2 Primary Assembly
chr18 gi|568815580|ref|NC_000018.10| Homo sapiens chromosome 18, GRCh38.p2 Primary Assembly
chr19 gi|568815579|ref|NC_000019.10| Homo sapiens chromosome 19, GRCh38.p2 Primary Assembly
chr20 gi|568815578|ref|NC_000020.11| Homo sapiens chromosome 20, GRCh38.p2 Primary Assembly
chr21 gi|568815577|ref|NC_000021.9| Homo sapiens chromosome 21, GRCh38.p2 Primary Assembly
chr22 gi|568815576|ref|NC_000022.11| Homo sapiens chromosome 22, GRCh38.p2 Primary Assembly
chrM gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome
chrX gi|568815575|ref|NC_000023.11| Homo sapiens chromosome X, GRCh38.p2 Primary Assembly
chrY gi|568815574|ref|NC_000024.10| Homo sapiens chromosome Y, GRCh38.p2 Primary Assembly`

The order seems the same to me; actually I used picard SortVcf to get All_20160407.sorted.vcf. Could you please help me to understand what I am missing here?

Thanks in advance,

Giuseppe

Issue · Github
by Sheila

Issue Number
883
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @linsson, we have had reports that Picard SortVcf does not correctly update the VCF index after sorting, leading to errors like this one. Try deleting the VCF index; GATK will regenerate one for you, which should now be correct and avoid this error.

  • Hi Geraldine and thanks a lot for your reply.

    Actually before you wrote it I used option " -U ALLOW_SEQ_DICT_INCOMPATIBILITY " and mutect2 run completed without errors. Is that ok?

    By the way, the software could not detect anything. My commanline is

    gatk -T MuTect2 -U ALLOW_SEQ_DICT_INCOMPATIBILITY -R /gporq1_1M/usr/aprea/bio/exome_analysys_test/ref/GRCh38.p2/GRCh38.p2.fa -I:tumor ID36155FROM31976_S1.merge.sorted.detailsAdded.realigned.reordered.recal.bam --dbsnp /gporq1_1M/usr/aprea/bio/exome_analysys_test/dbsnp/All_20160407.sorted.vcf --cosmic /gporq1_1M/usr/aprea/bio/exome_analysys_test/cosmic/AllVariants.sorted.vcf -o ID36155FROM31976_S1.somaticVariant_2.vcf -m2debug -nct ${THREADS}

    In stdout I can read million times messages like

    Dropping read NB501072:48:HYKWTBGXX:3:23508:20962:19823 1/2 129b aligned read. due to overlapping read fragment rules

    and

    Discarding lower quality read of overlapping pair NB501072:48:HYKWTBGXX:2:23111:15168:17504 and allele C*

    It sounds like mutect2 default options are too strict for my data. Any comment about that?

    g

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @linsson
    Hi,

    Are you using version 3.5? There was a bug in 3.5 that output empty VCFs. Can you try with the latest version of GATK (3.6)?

    Thanks,
    Sheila

  • Thanks a lot for your reply Sheila!

    I was using gatk v 3.5. Now I will try version 3.6. Shall I re-run all the RealignerTargetCreator-IndelRealigner-BaseRecalibrator-PrintReads-MuTect2 machinery or running again MuTect2 only is enough?

    Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @linsson
    Hi,

    You can simply run MuTect2 on the files.

    -Sheila

  • Ok thank you Sheila.

    I used mutect2 from gatk v. 3.6-0. Unfortunately now it looks like I have the opposite problem with 615764 called variants. Does it sounds normal to you? I looked at vcf and bam files in igv and I noticed that some variants were called with very low coverage:

    chrY 56881044 rs2641124 T C . clustered_events;t_lod_fstar DB;ECNT=2;HCNT=1;MAX_ED=20;MIN_ED=20;NLOD=0.00;TLOD=4.50 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:0,2:1.00:0:2:0.00:0|1:56881044_T_C:0,59:0:0
    chrY 56881064 rs2527423 A G . clustered_events;t_lod_fstar DB;ECNT=2;HCNT=1;MAX_ED=20;MIN_ED=20;NLOD=0.00;TLOD=4.50 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:0,2:1.00:0:2:1.00:0|1:56881044_T_C:0,68:0:0

    Is there any suggested filtering?

    Thanks again.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @linsson,

    The records you posted are filtered; that means the program considers that they are probably false positives, but is emitting them to the VCF in case you want to adjust filters to boost sensitivity. To count how many variants are really called you need to count only variant records where the FILTER field contains only . or PASS.

  • Dear Geraldine, thanks for your reply.

    Here is the complete set of FILTER fields for may sample

    egrep "^[^#]" ID36155FROM31976_S1.somaticVariant_3.vcf |awk 'BEGIN{FS="\t"} {print $7}' |sort |uniq -c
    68224 clustered_events
    170660 clustered_events;homologous_mapping_event
    817 clustered_events;homologous_mapping_event;str_contraction
    246 clustered_events;homologous_mapping_event;str_contraction;t_lod_fstar
    6 clustered_events;homologous_mapping_event;str_contraction;t_lod_fstar;triallelic_site
    27 clustered_events;homologous_mapping_event;str_contraction;triallelic_site
    69457 clustered_events;homologous_mapping_event;t_lod_fstar
    541 clustered_events;homologous_mapping_event;t_lod_fstar;triallelic_site
    1462 clustered_events;homologous_mapping_event;triallelic_site
    852 clustered_events;str_contraction
    234 clustered_events;str_contraction;t_lod_fstar
    8 clustered_events;str_contraction;t_lod_fstar;triallelic_site
    41 clustered_events;str_contraction;triallelic_site
    25243 clustered_events;t_lod_fstar
    130 clustered_events;t_lod_fstar;triallelic_site
    378 clustered_events;triallelic_site
    1044 homologous_mapping_event
    1 homologous_mapping_event;str_contraction;t_lod_fstar
    365 homologous_mapping_event;t_lod_fstar
    1 homologous_mapping_event;t_lod_fstar;triallelic_site
    5 homologous_mapping_event;triallelic_site
    204838 PASS
    2635 str_contraction
    839 str_contraction;t_lod_fstar
    25 str_contraction;t_lod_fstar;triallelic_site
    99 str_contraction;triallelic_site
    66696 t_lod_fstar
    221 t_lod_fstar;triallelic_site
    669 triallelic_site

    204838 valid variants. Do you think that is feasible?

    thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Do you have a matched normal or is this tumor only?
  • Hi,

    Unfortunately this is a tumor only run.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Ah, that would explain it. There will be a lot of calls that are germline that MuTect can't exclude. Tumor-only calling is currently not supported. We have some work underway to determine a methodology to do this properly but it's not ready yet. In the meantime I'm afraid you're on your own. I would recommend looking at the literature to see what others have done for this type of problem.
  • Ok, I see.

    Just one more question about mutect2. I am using both --dbsnp and --cosmic options but I can see only dbsnp IDs (3rd field in vcf format) in the results. Is there any way to have both dbsnp and cosmic keys in the final results?

    Thank you very much again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Not directly -- but you can use VariantAnnotator to add cosmic IDs to the vcf after calling.
Sign In or Register to comment.