The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Error running CollectAlignmentSummaryMetrics on a bam generated from .maf file

SDFfASFSDFfASF Member Posts: 5

Hello,
Recently I run an alignment with LAST tool (http://last.cbrc.jp/ - fasta aligner for long reads alignment), it produces .maf file which I then converted to sam(with http://last.cbrc.jp/doc/maf-convert.html) then to bam (with picard). Until now everything looks fine, next I try to run picard CollectAlignmentSummaryMetrics and it throws this error:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:329)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:195)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:127)
at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:93)
at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:89)
at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:147)
at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:138)
at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

I am adding the head of the bam file:

0034a196-edbc-429f-89c4-b5280a486760_Basecall_2D_2d 0 burn-in 1 100 4H21=1D18=.....1D17=2D1X6=6D1X11=2I1X31=19H * 0 0 GGGCGGCGACCTCGCGGGT.....AGCATGCCACG * NM:i:152 AS:i:10909
06c0ff36-09df-4bb3-b952-146fca6f60ae_Basecall_2D_2d 0 burn-in 1 100 8H21=1D3=......2D1=1I57=2D68=1D1=2D42=29H * 0 0 GGGCGGCGACCTCGCGGG...........GCAAGCGTGA * NM:i:402 AS:i:33419

I deleted values in the middle of SEQ and CIGAR strings because they are very long.

Running ValidateSamFile on this bam file shows not relevant problem:

HISTOGRAM java.lang.String

Error Type Count
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 2441

For the same sequencing run I had fastq files which I aligned with bwa and when I run CollectAlignmentSummaryMetrics on the bam file from this workflow it worked fine. here is a head of the bam from this workflow (alignment with bwa using fastq):

0034a196-edbc-429f-89c4-b5280a486760_Basecall_2D_2d 0 burn-in 1 60 4S18M1D1....M6D32M19S * 0 0 TGCTGG...TGTTTGA /)6-,(-.../9/)0,*, MD:Z:18^T..A11G31 NM:i:138 AS:i:1920 XS:i:0
06c0ff36-09df-4bb3-b952-146fca6f60ae_Basecall_2D_2d 0 burn-in 1 60 8S18M1D1...D1M2D42M29S * 0 0 GTATTGC...ATGTGTTTC =.01-)**)./....'-.+*+ MD:Z:18^.^A1^AA42 NM:i:371 AS:i:5836 XS:i:0

Same as before, I removed the characters in the middle of the long strings.

Hope you could help me with my problems.

Thanks and have a great day.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    The ArrayIndexOutOfBoundsException error suggests that you may have some malformed reads where the alignment information does not make sense, e.g. Maps off the end of a contig or something like that. That could be a bug in the aligner you're using. This seems especially likely considering the BWA alignment appears to be healthy.

    Geraldine Van der Auwera, PhD

  • SDFfASFSDFfASF Member Posts: 5
    edited December 2016

    Well it seems that the only difference between the two bam files (one from aligning with fastq and one from aligning with fasta, two example sequences from the files are posted in the first post) is that in one file there is a phred score and in the other file there is a single "*" in that place.
    I'm trying to figure how to work with that but if anybody have a suggestion I will try it.

    BTW, I'm using the latest version of picard (2.8.1)

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator, Dev Posts: 437 admin

    Hi @SDFfASF,

    If what you post is indeed the top of the BAM file, then you are missing an actual header. Also, your error messages are saying that the BAM is missing read group information, which also indicates a missing header.

    To start, take a look at this FAQ to see what a BAM header should look like. To add such a header, e.g. you can use Picard's ReplaceSamHeader.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    I'm pretty sure the problem is because of that asterisk. What program generated your alignments?

    Geraldine Van der Auwera, PhD

  • SDFfASFSDFfASF Member Posts: 5

    @shlee It had a header I just didn't post it by mistake. I will read the faq for sure, thanks.

    @Geraldine_VdAuwera I'm pretty sure too and when I put a some phred values instead of this asterisk picard worked fine. But I thought i saw somewhere in the documentation that picard did not requier qscore in the bam/sam and could work with files where it's replaced with a "*". The tool was last, i linked to it in the first post.

    Issue · Github
    by Sheila

    Issue Number
    1625
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator, Dev Posts: 437 admin
    edited January 13

    @SDFfASF,

    Thanks for the feedback. I examined your two sets of records carefully and notice one interesting difference. The first set (that gives you problems with CollectAlignmentSummaryMetrics) uses extended CIGAR nomenclature (1D17=2D1X6=6D1X11=2I1X31=19H), while the second set (that works fine) does not (M6D32M19S). Would it be possible for you to attach a file of 100 such extended CIGAR SAM records in a valid BAM file, i.e. with header, so that we can test whether this is the problem or if something else is causing the issue? Can you make sure this snippet still gives you the error before attaching it here in this thread? Thanks.

  • SDFfASFSDFfASF Member Posts: 5
    edited January 15

    Lately, I'm using another tool that had similar problems with it's output with picard. Though it didn't use extended CIGAR and still had the problem with the asterisk, so I will post some test file snippets from this tool's output. I attached files which were originally .sam files but I changed it to .txt in order to upload.

    Important to note: picard throws "Error parsing SAM header. @RG line missing SM tag" with those files but I read that this SM tag is not essential for picard and u can ignore this error by adding "VALIDATION_STRINGENCY=SILENT" which I did.

    First file [testWithAsterisk.txt] - Original sam file which had asterisk instead of qscore:
    @HD VN:1.0 SO:unsorted
    @SQ SN:burn-in LN:48502
    @RG ID:1
    @PG ID:6 PN:minialign
    8915e658-528c-4677-88a8-c2eba6c58fc5_Basecall_2D_2d 16 burn-in
    8915e658-528c-4677-88a8-c2eba6c58fc5_Basecall_2D_template 4 * 0 0 * * 0 0 TTGGCAGATAACATATTTTATCTTTTGCTCACCAGTTCGATGATTAACGGAAGTTCATCTGCTTTATGGG * RG:Z:1
    8da715a9-3717-4f04-9667-e7e0c2792104_Basecall_2D_2d 16 burn-in

    And the command and error it produced: (it didnt output any file)

    $ java -jar ~/tools/picard.jar CollectAlignmentSummaryMetrics R=../LambdaRefGenome.fa I=test2.sam O=testSummary4.txt VALIDATION_STRINGENCY=SILENT
    [Sun Jan 15 10:53:12 IST 2017] picard.analysis.CollectAlignmentSummaryMetrics REFERENCE_SEQUENCE=../LambdaRefGenome.fa INPUT=test2.sam OUTPUT=testSummary4.txt VALIDATION_STRINGENCY=SILENT MAX_INSERT_SIZE=100000 EXPECTED_PAIR_ORIENTATIONS=[FR] ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Sun Jan 15 10:53:12 IST 2017] Executing as artemd@nshomron.tau.ac.il on Linux 2.6.32-642.1.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17; Picard version: 2.8.1-SNAPSHOT
    WARNING 2017-01-15 10:53:12 SinglePassSamProgram File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
    [Sun Jan 15 10:53:12 IST 2017] picard.analysis.CollectAlignmentSummaryMetrics done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=504889344
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:323)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:189)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:121)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:87)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
    at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:83)
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:147)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:138)
    at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

    Now the second file [testWithQscore.txt] - with the only thing changed is added (fake) qscore values instead of the asterisk:
    @HD VN:1.0 SO:unsorted
    @SQ SN:burn-in LN:48502
    @RG ID:1
    @PG ID:6 PN:minialign
    8915e658-528c-4677-88a8-c2eba6c58fc5_Basecall_2D_2d 16 burn-in
    8915e658-528c-4677-88a8-c2eba6c58fc5_Basecall_2D_template 4 * 0 0 * * 0 0 TTGGCAGATAACATATTTTATCTTTTGCTCACCAGTTCGATGATTAACGGAAGTTCATCTGCTTTATGGG 1111111111111111111111111111111111111111111111111111111111111111111111 RG:Z:1
    8da715a9-3717-4f04-9667-e7e0c2792104_Basecall_2D_2d 16 burn-in

    And the command for this one is: (it produced a normal AlignmentSummaryMetrics file)

    $ java -jar ~/tools/picard.jar CollectAlignmentSummaryMetrics R=../LambdaRefGenome.fa I=test.sam O=testSummary2.txt VALIDATION_STRINGENCY=SILENT
    [Sun Jan 15 10:53:01 IST 2017] picard.analysis.CollectAlignmentSummaryMetrics REFERENCE_SEQUENCE=../LambdaRefGenome.fa INPUT=test.sam OUTPUT=testSummary2.txt VALIDATION_STRINGENCY=SILENT MAX_INSERT_SIZE=100000 EXPECTED_PAIR_ORIENTATIONS=[FR] ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Sun Jan 15 10:53:01 IST 2017] Executing as artemd@nshomron.tau.ac.il on Linux 2.6.32-642.1.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17; Picard version: 2.8.1-SNAPSHOT
    WARNING 2017-01-15 10:53:01 SinglePassSamProgram File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
    [Sun Jan 15 10:53:01 IST 2017] picard.analysis.CollectAlignmentSummaryMetrics done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=504889344

    Hope this helps.

    ArtemD.

    txt
    txt
    testWithQscore.txt
    12K
    txt
    txt
    testWithAsterisk.txt
    7K
  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin

    Hi @SDFfASF,

    I can confirm it's the asterisk that causes a problem. The error stack trace shows that this is the function that's choking on your read:

    IndividualAlignmentSummaryMetricsCollector.collectQualityData
    

    This function looks up the quality scores by the index position of the corresponding base, so if the array is just a single asterisk, the function will error out for any base after the first. That's why you get an ArrayIndexOutOfBounds as explained here.

    The tricky thing is that many Picard tools have requirements that are different from the majority of tools and are often not documented. The metrics collection tools tend to have the most exhaustive requirements for records being complete, because they access most if not all of the properties of the data. We'll try to document these things more clearly in future.

    Geraldine Van der Auwera, PhD

  • SDFfASFSDFfASF Member Posts: 5

    OK, thanks @Geraldine_VdAuwera that clears up a whole lot of confusion. Now I believe that collect summary metrics require the qscore values in order to calculate few metrics "for high quality bases" but can I somehow turn this option off so picard could collect all other metrics not related to quality? OR can I ask picard to assume all bases have the same qscore?

    If there is no solution on picard's end I guess I would need to either loop over each read in the sam file and to "fake" qscore values of the same length of the read or (what might be more troublesome to write) for each read go to the original fastq file and place the qscore values from the fastq to the corresponding bases for this read in the sam file.

Sign In or Register to comment.