Letting CountReads implement TreeReducible results in ClassCastException?

kwestbrookskwestbrooks Mountain View, CAMember

Hello GATK gurus:

I am new to using GATK. In an effort to understand how to implement my own Walker subclass, I wanted to implement a variant of CountReads that implements the TreeReducible interface:

@Requires(DataSource.READS)
public class TreeReducibleCountReads extends ReadWalker<Integer, Long>
    implements NanoSchedulable, TreeReducible<Long> {

  @Override
  public Integer map(ReferenceContext context, GATKSAMRecord record, RefMetaDataTracker tracker) {
    return 1;
  }

  @Override
  public Long reduce(Integer value, Long sum) {
    return treeReduce(value.longValue(), sum);
  }

  @Override
  public Long reduceInit() {
    return 0L;
  }

  @Override
  public Long treeReduce(Long lhs, Long rhs) {
    return lhs + rhs;
  }
}

I invoked GATK with the following arguments:

java -jar GenomeAnalysisTK.jar -T TreeReducibleCountReads -R myreference.fasta -I myreads.bam -nt 2 -nct 2

And I received the following stacktrace:

java.lang.ClassCastException: org.broadinstitute.gatk.engine.datasources.providers.LocusShardDataProvider cannot be cast to org.broadinstitute.gatk.engine.datasources.providers.ReadShardDataProvider
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:1)
    at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
    at java.util.concurrent.FutureTask.run(FutureTask.java:237)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
    at java.lang.Thread.run(Thread.java:745)

I think that GATK is confused about how to shard the data. I expected that the data would be sharded by reads, but I think GATK is trying to shard by locus. Can anyone point out my mistake?

Best Answer

Answers

  • kwestbrookskwestbrooks Mountain View, CAMember

    Friendly ping! :-)

    A little bit more diagnostic information: The problem only manifests itself in the presence of the -nt 2 command line argument. If I omit that argument, everything runs just fine.

  • kwestbrookskwestbrooks Mountain View, CAMember

    Thanks for your response!

    I'm just trying to learn how GATK works. If this is a limitation of the software, then I can accept that.

Sign In or Register to comment.