Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unable to execute QScript: CNVDiscoveryPipeline.script() threw the following exception: java.lang.N

meharmehar Member ✭✭
edited June 2015 in GenomeSTRiP

Hi,

I am using the GenomeStrip version extracted from "svtoolkit_2.00.1572.tar.gz" . Taking the example from
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html , executed the following command:

classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-R canFam3.fa \
-genomeMaskFile mask.canFam3.fa
-I BC_Bams.list \
-md meta_directory \
-runDirectory CNV \
-jobLogDir CNV/logs \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \
-run

And have the following error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.utils.exceptions.UserException$CannotExecuteQScript: Unable to execute QScript: CNVDiscoveryPipeline.script() threw the following exception: java.lang.NullPointerException
    at org.broadinstitute.gatk.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:159)
    at org.broadinstitute.gatk.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:147)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:147)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62)
    at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
Caused by: java.lang.NullPointerException
    at CNVDiscoveryPipeline.sequenceIntervalMap$lzycompute(CNVDiscoveryPipeline.q:59)
    at CNVDiscoveryPipeline.sequenceIntervalMap(CNVDiscoveryPipeline.q:55)
    at CNVDiscoveryPipeline.createStages(CNVDiscoveryPipeline.q:213)
    at CNVDiscoveryPipeline.script(CNVDiscoveryPipeline.q:97)
    at org.broadinstitute.gatk.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:156)
    ... 10 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version <unknown>):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to execute QScript: CNVDiscoveryPipeline.script() threw the following exception: java.lang.NullPointerException
##### ERROR ------------------------------------------------------------------------------------------
INFO  11:56:34,380 QCommandLine - Shutting down jobs. Please wait... 

Could someone help to fix this. Thanks.

Tagged:

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    This is a bug in the argument defaulting - we'll fix this in the next release.
    In the meantime, you should be able to work around this problem by explicitly passing
    -intervalList canFam.interval.list
    and put in the canFam.interval.list file a list of the chromosomes on which you want to run CNV discovery.
    For human, we include chromosomes 1-22, X and Y, but we do not call by default on the other miscellaneous contigs.

    Also, if you have not read it, you should consult this page: http://www.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html

  • meharmehar Member ✭✭
    edited June 2015

    I tried passing the canFam3.interval.list and met with the below error:
    ERROR MESSAGE: Unable to execute QScript: CNVDiscoveryPipeline.script() threw the following exception:
    java.lang.RuntimeException: canFam3.intervals.list is an invalid sequence
    ##### ERROR ------------------------------------------------------------------------------------------
    INFO 21:30:01,031 QCommandLine - Shutting down jobs. Please wait...

    The file is as follows:

    chr1
    chr2
    chr3
    chr4
    chr5
    chr6
    chr7
    chr8
    chr9
    chr10
    chr11
    chr12
    chr13
    chr14
    chr15
    chr16
    chr17
    chr18
    chr19
    chr20
    chr21
    chr22
    chr23
    chr24
    chr25
    chr26
    chr27
    chr28
    chr29
    chr30
    chr31
    chr32
    chr33
    chr34
    chr35
    chr36
    chr37
    chr38
    chrX
    

    The chromosome names are extracted from the .fai file. The only difference being "chrMT" which is not included in the canFam3.interval.list file. I wonder what is wrong with the file. Could you assist further to fix this.

    Post edited by mehar on
  • meharmehar Member ✭✭

    Hi, It might also be an issue with reference metadirectory for my genome. The article says "If you are using a human reference genome that is not listed, please let us know. We may be able to create a reference metadata bundle for your reference genome or give you advice on how to make simple modifications to a close-enough reference metadata bundle to get acceptable results. ". I am working on dog genome and I managed only to build the "-genomeMaskFile mask.canFam3.fa" file using the ComputeGenomeMask utility. However, there seems to be multiple metadata files are needed, for which i couldn't find any available utilities to generate them. Could you help us to make it work for our genome if it is the cause of the above error.

  • meharmehar Member ✭✭

    I managed to create the gcmask, rdmask, lcmask,svmask, dict and mdversion files. I am left with ploidymap and gendermask files. Dog has a diploid genome with human like XY chromosomes. I am not sure about the start and end positions in the ploidymap and gendermask files in the page:
    http://www.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html

    Could you help me to understand better and create these two files for dog genome.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    I'm not sure how the camFam reference was constructed.
    For human (hg19), we use a ploidy map that looks like this:

    ```X 2699521 154931043 F 2
    X 2699521 154931043 M 1
    Y 1 59373566 F 0
    Y 1 59373566 M 1

          • 2```

    The fields are CHR, START, END, GENDER, PLOIDY.
    The last line matches anything else in the genome and thus indicates that human is diploid.
    The map is the way it is because in the hg19 reference used for alignment (at least the one used at Broad), the PARs in Y are masked, so they appear to be diploid in both males and females. The rest of X is diploid in females, haploid in males, etc.

    The gendermask.bed file is used only for gender determination and should be optional. For human, we use it to mask out the PARs and X and some repetitive regions on Y and this helps to give better gender calls. Here is what we use for hg19:

    X 2699520 154931043 Y 0 10000000 Y 13900000 28784000

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Also, after you are satisfied it is working, if you would like to package up the rmd bundle for canFam3, we would be happy to host it (or post a pointer to it if you want to host it yourself).

  • meharmehar Member ✭✭

    The dog reference is generated from a female dog and we don not have Y in the reference. Could it be fine to use ploidy map file as:

     X 0  123869142 F 2
     X 0 123869142 M 1
     * * * * 2
    

    where third column specifies the length of the chrX.
    and gendermap file as:

    X 0  123869142
    

    @bhandsaker said:
    I'm not sure how the camFam reference was constructed.
    For human (hg19), we use a ploidy map that looks like this:

    ```X 2699521 154931043 F 2
    X 2699521 154931043 M 1
    Y 1 59373566 F 0
    Y 1 59373566 M 1

          • 2```

    The fields are CHR, START, END, GENDER, PLOIDY.
    The last line matches anything else in the genome and thus indicates that human is diploid.
    The map is the way it is because in the hg19 reference used for alignment (at least the one used at Broad), the PARs in Y are masked, so they appear to be diploid in both males and females. The rest of X is diploid in females, haploid in males, etc.

    The gendermask.bed file is used only for gender determination and should be optional. For human, we use it to mask out the PARs and X and some repetitive regions on Y and this helps to give better gender calls. Here is what we use for hg19:

    X 2699520 154931043 Y 0 10000000 Y 13900000 28784000

  • meharmehar Member ✭✭
    edited June 2015

    Yes, i can provide the bundle to host once the issue is completely solved at:
    http://gatkforums.broadinstitute.org/discussion/3639/cn2-mask-file-for-non-human-genome#latest

    Thanks

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Are you sure your chr names are matching between the bed file and the reference fasta?
    I think the person in that other post was having a different problem.

  • meharmehar Member ✭✭

    Yes, the chr names are consistent in bed and reference fasta. The sample lines shown from the bed file and fasta file show the notation of chromosomes is also same i.e. following chr convention.

  • skashinskashin Member ✭✭

    Hi,

    Could you please post the full exception for the error?
    CNVDiscoveryPipeline.script() threw the following exception:
    java.lang.RuntimeException: canFam3.intervals.list is an invalid sequence

    Seva

  • meharmehar Member ✭✭

    This issue is just solved by changing the file name to "canFam3_chromosomes.list" . It is weird that though the two files have the same extension ".list" it produced the error which is solved by changing to "canFam3_chromosomes.list".

Sign In or Register to comment.