Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

The reference fasta for hg19 from your resource bundle is not readable by PicardTools reorder

I just downloaded GATK 3.2. I am using PicardTools 1.109

I am trying to get a tophat generated bam into format suitable for GATK to digest.

I began with these commands from Picardtools
java -Xmx16g -jar ${PICARDPATH}/SortSam.jar SO=coordinate INPUT=${FILE} OUTPUT=${FILE%%.sam}.bam CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT
java -Xmx16g -jar ${PICARDPATH}/MarkDuplicates.jar INPUT=${FILE%%.sam}.bam OUTPUT=${FILE%%.sam}.mrkdup.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

But GATK did not like the files and complained like this:

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam
ERROR reads contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, MT, X, Y]
ERROR ------------------------------------------------------------------------------------------

When I attempt to use that reference to reorder my bam e.g.

java -jar ${PICARDPATH}/ReorderSam.jar INPUT=Placebo__727_accepted_hits.bam.mrkdup.ARG.bam OUTPUT=Placebo__727_accepted_hits.bam.mrkdup.ARG.Karyotype REFERENCE=ucsc.hg19.fasta get this error:

INFO 2014-07-17 10:45:26 ReorderSam SN=%s LN=%d%nchrUn_gl00024736422
INFO 2014-07-17 10:45:26 ReorderSam SN=%s LN=%d%nchrUn_gl00024839786
INFO 2014-07-17 10:45:26 ReorderSam SN=%s LN=%d%nchrUn_gl00024938502
INFO 2014-07-17 10:45:26 ReorderSam Reordering SAM/BAM file:
[Thu Jul 17 10:45:26 PDT 2014] net.sf.picard.sam.ReorderSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=251658240
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" net.sf.picard.PicardException: New reference sequence does not contain a matching contig for 1
at net.sf.picard.sam.ReorderSam.buildSequenceDictionaryMap(ReorderSam.java:217)
at net.sf.picard.sam.ReorderSam.doWork(ReorderSam.java:98)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:120)
at net.sf.picard.sam.ReorderSam.main(ReorderSam.java:77)

The reference fasta for hg19 from your resource bundle is sorted like this:

chrM
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrX
chrY

How can I get a proper reference FASTA to reorder my bam so that your GATK will accept it?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The problem is not our reference, it's that you're trying to reorder a b37-aligned bam against hg19. You need to use our b37 reference instead of the hg19 one.

  • yuegeorgeyuegeorge hkMember
    edited November 2014

    @Geraldine_VdAuwera said:
    The problem is not our reference, it's that you're trying to reorder a b37-aligned bam against hg19. You need to use our b37 reference instead of the hg19 one.

    Hi, you meaning this ft: ftp://ftp.broadinstitute.org/bundle/2.8/b37/
    but in here, i didn't find any b37 genomic reference.

    Post edited by Geraldine_VdAuwera on
  • yuegeorgeyuegeorge hkMember
    edited November 2014

    @Geraldine_VdAuwera said:
    The problem is not our reference, it's that you're trying to reorder a b37-aligned bam against hg19. You need to use our b37 reference instead of the hg19 one.

    Hi Geraldine_VdAuwera

    java -jar /usr/local/software/picard-tools-1.105/ReorderSam.jar I=./aCGH5875.sam O=./Reorder.sam REFERENCE=/home/george/alignment/gatk_resource/ucsc.hg19.fasta
    

    After aln the fasta file, in reorder process I also confront the same problem.

    if i sort and rmdup the bam file. Then use GATK to call the snv, everything seem is ok. But if i use this file to BaseRecalibrator, the same eroor is happened. why? thanks.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @yuegeorge What is the exact error? You may be using a resource file that does not match your reference.

Sign In or Register to comment.