To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Tutorial: error running examples

on the forum page

there are two examples. The first runs fine. The second generates this error

MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

but the input files are the same. I only changed "Reads" to "Loci" in the command. I am running Unix so I do not need to retype the entire command. This command works fine

java -jar GenomeAnalysisTK.jar -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam

This command produces the error

java -jar GenomeAnalysisTK.jar -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt

Any suggestions?



  • CarneiroCarneiro Charlestown, MAMember
    edited May 2013

    I definitely cannot replicate that error, maybe your FASTA file is corrupted?

    $ java -jar ../../dist/GenomeAnalysisTK.jar -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt                                                    [17:28:15]
    INFO  17:28:21,280 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,282 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-76-gf39bc59, Compiled 2013/05/21 17:23:44
    INFO  17:28:21,282 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO  17:28:21,282 HelpFormatter - For support and documentation go to
    INFO  17:28:21,286 HelpFormatter - Program Args: -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
    INFO  17:28:21,286 HelpFormatter - Date/Time: 2013/05/21 17:28:21
    INFO  17:28:21,286 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,286 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,409 GenomeAnalysisEngine - Strictness is SILENT
    INFO  17:28:21,498 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO  17:28:21,524 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  17:28:21,541 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO  17:28:21,642 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
    INFO  17:28:21,652 GenomeAnalysisEngine - Done creating shard strategy
    INFO  17:28:21,653 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
    INFO  17:28:21,768 ProgressMeter -            done        2.05e+03    0.0 s       55.0 s     97.3%         0.0 s     0.0 s
    INFO  17:28:21,769 ProgressMeter - Total runtime 0.12 secs, 0.00 min, 0.00 hours
    INFO  17:28:21,851 MicroScheduler - 0 reads were filtered out during traversal out of 33 total (0.00%)
    INFO  17:28:22,438 GATKRunReport - Uploaded run statistics report to AWS S3
  • Strange thing. What corruption will allow the data to run through -T CountReads but not run through -T CountLoci?

  • CarneiroCarneiro Charlestown, MAMember
    edited May 2013

    great question, it should visit the same locations in the reference exactly.

    I'm afraid I don't have an answer to what you are observing. The error states an invalid base in the reference fasta. Can you md5 checksum the reference, dict and index?

    $ md5sum exampleFASTA.fasta exampleFASTA.fasta.fai exampleFASTA.dict
    36880691cf9e4178216f7b52e8d85fbe  exampleFASTA.fasta
    c50494fca6bb42ae02f26e9f0c585ee6  exampleFASTA.fasta.fai
    852fa68dbe31f42743c060ad2913279c  exampleFASTA.dict
Sign In or Register to comment.