Tutorial: error running examples

on the forum page


there are two examples. The first runs fine. The second generates this error

MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '10'

but the input files are the same. I only changed "Reads" to "Loci" in the command. I am running Unix so I do not need to retype the entire command. This command works fine

java -jar GenomeAnalysisTK.jar -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam

This command produces the error

java -jar GenomeAnalysisTK.jar -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt

Any suggestions?



    edited May 2013

    I definitely cannot replicate that error, maybe your FASTA file is corrupted?

    $ java -jar ../../dist/GenomeAnalysisTK.jar -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt                                                    [17:28:15]
    INFO  17:28:21,280 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,282 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-76-gf39bc59, Compiled 2013/05/21 17:23:44
    INFO  17:28:21,282 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO  17:28:21,282 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO  17:28:21,286 HelpFormatter - Program Args: -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
    INFO  17:28:21,286 HelpFormatter - Date/Time: 2013/05/21 17:28:21
    INFO  17:28:21,286 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,286 HelpFormatter - ---------------------------------------------------------------------------------
    INFO  17:28:21,409 GenomeAnalysisEngine - Strictness is SILENT
    INFO  17:28:21,498 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO  17:28:21,524 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  17:28:21,541 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO  17:28:21,642 GenomeAnalysisEngine - Creating shard strategy for 1 BAM files
    INFO  17:28:21,652 GenomeAnalysisEngine - Done creating shard strategy
    INFO  17:28:21,653 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
    INFO  17:28:21,768 ProgressMeter -            done        2.05e+03    0.0 s       55.0 s     97.3%         0.0 s     0.0 s
    INFO  17:28:21,769 ProgressMeter - Total runtime 0.12 secs, 0.00 min, 0.00 hours
    INFO  17:28:21,851 MicroScheduler - 0 reads were filtered out during traversal out of 33 total (0.00%)
    INFO  17:28:22,438 GATKRunReport - Uploaded run statistics report to AWS S3
    Strange thing. What corruption will allow the data to run through -T CountReads but not run through -T CountLoci?

    great question, it should visit the same locations in the reference exactly.

    I'm afraid I don't have an answer to what you are observing. The error states an invalid base in the reference fasta. Can you md5 checksum the reference, dict and index?

    $ md5sum exampleFASTA.fasta exampleFASTA.fasta.fai exampleFASTA.dict
    36880691cf9e4178216f7b52e8d85fbe  exampleFASTA.fasta
    c50494fca6bb42ae02f26e9f0c585ee6  exampleFASTA.fasta.fai
    852fa68dbe31f42743c060ad2913279c  exampleFASTA.dict
