Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Letting CountReads implement TreeReducible results in ClassCastException?

kwestbrookskwestbrooks Mountain View, CAMember

Hello GATK gurus:

I am new to using GATK. In an effort to understand how to implement my own Walker subclass, I wanted to implement a variant of CountReads that implements the TreeReducible interface:

@Requires(DataSource.READS)
public class TreeReducibleCountReads extends ReadWalker<Integer, Long>
    implements NanoSchedulable, TreeReducible<Long> {

  @Override
  public Integer map(ReferenceContext context, GATKSAMRecord record, RefMetaDataTracker tracker) {
    return 1;
  }

  @Override
  public Long reduce(Integer value, Long sum) {
    return treeReduce(value.longValue(), sum);
  }

  @Override
  public Long reduceInit() {
    return 0L;
  }

  @Override
  public Long treeReduce(Long lhs, Long rhs) {
    return lhs + rhs;
  }
}

I invoked GATK with the following arguments:

java -jar GenomeAnalysisTK.jar -T TreeReducibleCountReads -R myreference.fasta -I myreads.bam -nt 2 -nct 2

And I received the following stacktrace:

java.lang.ClassCastException: org.broadinstitute.gatk.engine.datasources.providers.LocusShardDataProvider cannot be cast to org.broadinstitute.gatk.engine.datasources.providers.ReadShardDataProvider
    at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:1)
    at org.broadinstitute.gatk.engine.executive.ShardTraverser.call(ShardTraverser.java:98)
    at java.util.concurrent.FutureTask.run(FutureTask.java:237)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1113)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:588)
    at java.lang.Thread.run(Thread.java:745)

I think that GATK is confused about how to shard the data. I expected that the data would be sharded by reads, but I think GATK is trying to shard by locus. Can anyone point out my mistake?

Best Answer

Answers

  • kwestbrookskwestbrooks Mountain View, CAMember

    Friendly ping! :-)

    A little bit more diagnostic information: The problem only manifests itself in the presence of the -nt 2 command line argument. If I omit that argument, everything runs just fine.

  • kwestbrookskwestbrooks Mountain View, CAMember

    Thanks for your response!

    I'm just trying to learn how GATK works. If this is a limitation of the software, then I can accept that.

Sign In or Register to comment.