Writing unit / regression tests for QScripts

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,423Administrator, GATK Developer admin
edited February 3 in Pipelining with Queue

In addition to testing walkers individually, you may want to also run integration tests for your QScript pipelines.

1. Brief comparison to the Walker integration tests

  • Pipeline tests should use the standard location for testing data.
  • Pipeline tests use the same test dependencies.
  • Pipeline tests which generate MD5 results will have the results stored in the MD5 database].
  • Pipeline tests, like QScripts, are written in Scala.
  • Pipeline tests dry-run under the ant target pipelinetest and run under pipelinetestrun.
  • Pipeline tests class names must end in PipelineTest to run under the ant target.
  • Pipeline tests should instantiate a PipelineTestSpec and then run it via PipelineTest.exec().

2. PipelineTestSpec

When building up a pipeline test spec specify the following variables for your test.

Variable Type Description
args String The arguments to pass to the Queue test, ex: -S scala/qscript/examples/HelloWorld.scala
jobQueue String Job Queue to run the test. Default is null which means use hour.
fileMD5s Map[Path, MD5] Expected MD5 results for each file path.
expectedException classOf[Exception] Expected exception from the test.

3. Example PipelineTest

The following example runs the ExampleCountLoci QScript on a small bam and verifies that the MD5 result is as expected.

It is checked into the Sting repository under scala/test/org/broadinstitute/sting/queue/pipeline/examples/ExampleCountLociPipelineTest.scala

package org.broadinstitute.sting.queue.pipeline.examples

import org.testng.annotations.Test
import org.broadinstitute.sting.queue.pipeline.{PipelineTest, PipelineTestSpec}
import org.broadinstitute.sting.BaseTest

class ExampleCountLociPipelineTest {
  @Test
  def testCountLoci {
    val testOut = "count.out"
    val spec = new PipelineTestSpec
    spec.name = "countloci"
    spec.args = Array(
      " -S scala/qscript/examples/ExampleCountLoci.scala",
      " -R " + BaseTest.hg18Reference,
      " -I " + BaseTest.validationDataLocation + "small_bam_for_countloci.bam",
      " -o " + testOut).mkString
    spec.fileMD5s += testOut -> "67823e4722495eb10a5e4c42c267b3a6"
    PipelineTest.executeTest(spec)
  }
}

3. Running Pipeline Tests

Dry Run

To test if the script is at least compiling with your arguments run ant pipelinetest specifying the name of your class to -Dsingle:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest

Sample output:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour
   [testng]   => countloci PASSED DRY RUN
   [testng] PASSED: testCountLoci

Run

As of July 2011 the pipeline tests run against LSF 7.0.6 and Grid Engine 6.2u5. To include these two packages in your environment use the hidden dotkit .combined_LSF_SGE.

reuse .combined_LSF_SGE

Once you are satisfied that the dry run has completed without error, to actually run the pipeline test run ant pipelinetestrun.

ant pipelinetestrun -Dsingle=ExampleCountLociPipelineTest

Sample output:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] Checking MD5 for pipelinetests/countloci/run/count.out [calculated=67823e4722495eb10a5e4c42c267b3a6, expected=67823e4722495eb10a5e4c42c267b3a6]
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

Generating initial MD5s

If you don't know the MD5s yet you can run the command yourself on the command line and then MD5s the outputs yourself, or you can set the MD5s in your test to "" and run the pipeline.

When the MD5s are blank as in:

spec.fileMD5s += testOut -> ""

You run:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest -Dpipeline.run=run

And the output will look like:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is , equal? = false
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

Checking MD5s

When a pipeline test fails due to an MD5 mismatch you can use the MD5 database to diff the results.

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### Updating MD5 file: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] Checking MD5 for pipelinetests/countloci/run/count.out [calculated=67823e4722495eb10a5e4c42c267b3a6, expected=67823e4722495eb10a5e0000deadbeef]
   [testng] ##### Test countloci is going fail #####
   [testng] ##### Path to expected   file (MD5=67823e4722495eb10a5e0000deadbeef): integrationtests/67823e4722495eb10a5e0000deadbeef.integrationtest
   [testng] ##### Path to calculated file (MD5=67823e4722495eb10a5e4c42c267b3a6): integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] ##### Diff command: diff integrationtests/67823e4722495eb10a5e0000deadbeef.integrationtest integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] FAILED: testCountLoci
   [testng] java.lang.AssertionError: 1 of 1 MD5s did not match.

If you need to examine a number of MD5s which may have changed you can briefly shut off MD5 mismatch failures by setting parameterize = true.

spec.parameterize = true
spec.fileMD5s += testOut -> "67823e4722495eb10a5e4c42c267b3a6"

For this run:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest -Dpipeline.run=run

If there's a match the output will resemble:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is 67823e4722495eb10a5e4c42c267b3a6, equal? = true
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

While for a mismatch it will look like this:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is 67823e4722495eb10a5e0000deadbeef, equal? = false
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Sign In or Register to comment.