To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

BaseRecalibrator: Lexicographically sorted human genome sequence detected in knownSites

Hello,

I've tried everything but still get an error: when I run:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I reordered.bam -knownSites hg19.dbsnp.sorted.vcf -o recalibration_report.grp

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in knownSites. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1328for more information. Error details: knownSites contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

ERROR ------------------------------------------------------------------------------------------

I made the bam from a fastq and used ucsc.hg19.fasta as the reference. Made the dictionary file, sorted and indexed bam, ran MarkDuplicates and AddOrReplaceReadGroups. Next, I used RealignerTargetCreator followed by the IndelRealigner. This all worked without errors.

I downloaded the latest version of dbSNP150
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/00-All.vcf.gz
and followed the these steps to prepare the file:
2. gunzip 00-All.vcf.gz

  1. awk '/^#/ {print $0}' 00-All.vcf > head.txt

  2. sed -i 's/chrMT/chrM/g' head.txt

  3. awk '/^#/ {next}{print $0}' 00-All.vcf | sed 's/^/chr/' > 1.vcf

  4. sed -i 's/chrMT/chrM/g' 1.vcf completed step

  5. cat head.txt 1.vcf > hg19.dbsnp.vcf

  6. IGVTools/igvtools index hg19.dbsnp.vcf

  7. awk '/^#/ {next}{print $1}' hg19.dbsnp.vcf | sort |uniq

Next I ran BaseRecalibrator:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I initial.bam -knownSites hg19.dbsnp.vcf -o recalibration_report.grp

When I got an error message about cotig's not being ordered the same I ran:
picard ReorderSam on the initial.bam file and SortVcf on the hg19.dbsnp.vcf.

After I ran BaseRecalibrator again:

java -jar /data/GATK/GenomeAnalysisTK.jar -T BaseRecalibrator -R hg19.fasta -I reordered.bam -knownSites hg19.dbsnp.sorted.vcf -o recalibration_report.grp

** Lexicographically sorted human genome sequence detected in knownSites**.

I'm not sure what the problem is? Could someone please suggest a fix?

Thanks,

Lena

Comments

  • Here are the few lines from the fasta file:
    grep '>' hg19.fasta | more

    chrM
    chr1
    chr2
    chr3
    chr4
    chr5
    chr6
    chr7
    chr8
    chr9
    chr10
    chr11
    chr12
    chr13
    chr14
    chr15
    chr16
    chr17
    chr18
    chr19
    chr20
    chr21
    chr22
    chrX

    --More--
    and here is the ordered bam file:
    samtools view reordered.bam | cut -f 3 | grep chr | uniq -c | more
    1699 chrM
    1056500 chr1
    77084 chr2
    314511 chr3
    36983 chr4
    62576 chr5
    30587 chr6
    1052909 chr7
    235395 chr8
    161193 chr9
    48468 chr10
    52081 chr11
    200495 chr12
    310 chr13
    25201 chr14
    19100 chr15
    7364 chr16
    492468 chr17
    93 chr18
    580901 chr19
    1665 chr20
    186184 chr21
    159828 chr22
    --More--

    Please advice something.

    Thanks,

    Lena

  • and here is the sorted dbsnp vcf:

    contig=<ID=chrM,length=16571>

    contig=<ID=chr1,length=249250621>

    contig=<ID=chr2,length=243199373>

    contig=<ID=chr3,length=198022430>

    contig=<ID=chr4,length=191154276>

    contig=<ID=chr5,length=180915260>

    contig=<ID=chr6,length=171115067>

    contig=<ID=chr7,length=159138663>

    contig=<ID=chr8,length=146364022>

    contig=<ID=chr9,length=141213431>

    contig=<ID=chr10,length=135534747>

    contig=<ID=chr11,length=135006516>

    contig=<ID=chr12,length=133851895>

    contig=<ID=chr13,length=115169878>

    contig=<ID=chr14,length=107349540>

    contig=<ID=chr15,length=102531392>

    contig=<ID=chr16,length=90354753>

    contig=<ID=chr17,length=81195210>

    contig=<ID=chr18,length=78077248>

    contig=<ID=chr19,length=59128983>

    contig=<ID=chr20,length=63025520>

    contig=<ID=chr21,length=48129895>

    contig=<ID=chr22,length=51304566>

    contig=<ID=chrX,length=155270560>

    contig=<ID=chrY,length=59373566>

    contig=<ID=chr1_gl000191_random,length=106433>

    contig=<ID=chr1_gl000192_random,length=547496>

    contig=<ID=chr4_ctg9_hap1,length=590426>

    contig=<ID=chr4_gl000193_random,length=189789>

    contig=<ID=chr4_gl000194_random,length=191469>

    contig=<ID=chr6_apd_hap1,length=4622290>

    contig=<ID=chr6_cox_hap2,length=4795371>

    contig=<ID=chr6_dbb_hap3,length=4610396>

    contig=<ID=chr6_mann_hap4,length=4683263>

    contig=<ID=chr6_mcf_hap5,length=4833398>

    contig=<ID=chr6_qbl_hap6,length=4611984>

    contig=<ID=chr6_ssto_hap7,length=4928567>

    contig=<ID=chr7_gl000195_random,length=182896>

    contig=<ID=chr8_gl000196_random,length=38914>

    contig=<ID=chr8_gl000197_random,length=37175>

    contig=<ID=chr9_gl000198_random,length=90085>

    contig=<ID=chr9_gl000199_random,length=169874>

    contig=<ID=chr9_gl000200_random,length=187035>

    contig=<ID=chr9_gl000201_random,length=36148>

    contig=<ID=chr11_gl000202_random,length=40103>

    contig=<ID=chr17_ctg5_hap1,length=1680828>

    contig=<ID=chr17_gl000203_random,length=37498>

    contig=<ID=chr17_gl000204_random,length=81310>

    contig=<ID=chr17_gl000205_random,length=174588>

    contig=<ID=chr17_gl000206_random,length=41001>

    contig=<ID=chr18_gl000207_random,length=4262>

    contig=<ID=chr19_gl000208_random,length=92689>

    contig=<ID=chr19_gl000209_random,length=159169>

    contig=<ID=chr21_gl000210_random,length=27682>

    contig=<ID=chrUn_gl000211,length=166566>

    contig=<ID=chrUn_gl000212,length=186858>

    contig=<ID=chrUn_gl000213,length=164239>

    contig=<ID=chrUn_gl000214,length=137718>

    contig=<ID=chrUn_gl000215,length=172545>

    contig=<ID=chrUn_gl000216,length=172294>

    contig=<ID=chrUn_gl000217,length=172149>

    contig=<ID=chrUn_gl000218,length=161147>

    contig=<ID=chrUn_gl000219,length=179198>

    contig=<ID=chrUn_gl000220,length=161802>

    contig=<ID=chrUn_gl000221,length=155397>

    contig=<ID=chrUn_gl000222,length=186861>

    contig=<ID=chrUn_gl000223,length=180455>

    contig=<ID=chrUn_gl000224,length=179693>

    contig=<ID=chrUn_gl000225,length=211173>

    contig=<ID=chrUn_gl000226,length=15008>

    contig=<ID=chrUn_gl000227,length=128374>

    contig=<ID=chrUn_gl000228,length=129120>

    contig=<ID=chrUn_gl000229,length=19913>

    contig=<ID=chrUn_gl000230,length=43691>

    contig=<ID=chrUn_gl000231,length=27386>

    contig=<ID=chrUn_gl000232,length=40652>

    contig=<ID=chrUn_gl000233,length=45941>

    contig=<ID=chrUn_gl000234,length=40531>

    contig=<ID=chrUn_gl000235,length=34474>

    contig=<ID=chrUn_gl000236,length=41934>

    contig=<ID=chrUn_gl000237,length=45867>

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @yelekley
    Hi Lena,

    Can you try deleting the VCF index and running the tool again? There was a bug in SortVcf that may be causing this issue.

    Thanks
    Sheila

    Issue · Github
    by Sheila

    Issue Number
    2174
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Yes, it worked. I wish it was mentioned in the documentation for BaseRecalibrator. Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @yelekley The issue didn't have anything to do with BaseRecalibrator, it was SortVcf failing to generate a new index. This has been fixed so if you use the latest version of Picard to do the sorting it should work normally now.

Sign In or Register to comment.