NullPointerException in PhaseByTransmission

nodrogluapnodrogluap Posts: 3Member
edited January 2013 in Ask the GATK team

Hi all,

Has anyone else gotten the following:

java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:242) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:306) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:35) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:225) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:149) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

My command line was: java -jar GenomeAnalysisTK.jar -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -f "mom+dad=child" -R hg19.fa

Cheers,

Paul

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,820Administrator, GATK Developer admin

    Hi Paul, can you specify which version of GATK you are using? Ideally we would prefer to have the full stack trace, not just a cutout of the error message.

    Geraldine Van der Auwera, PhD

  • nodrogluapnodrogluap Posts: 3Member

    Hi Geraldine,

    The version is v1.2-29-g0acaf2d.

    Here the full output...

    INFO 17:34:16,946 HelpFormatter - --------------------------------------------------------------------------------- INFO 17:34:16,949 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.2-29-g0acaf2d, Compiled 2011/09/29 00:21:43 INFO 17:34:16,949 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 17:34:16,950 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki INFO 17:34:16,950 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa INFO 17:34:16,950 HelpFormatter - Program Args: -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -f dad+mom=child -R /home/gordonp/ngs_utils/dbs/hg19_revised.fa
    INFO 17:34:16,950 HelpFormatter - Date/Time: 2012/09/05 17:34:16 INFO 17:34:16,951 HelpFormatter - --------------------------------------------------------------------------------- INFO 17:34:16,951 HelpFormatter - --------------------------------------------------------------------------------- INFO 17:34:16,959 RodBindingArgumentTypeDescriptor - Dynamically determined type of w01.sorted.vcf to be VCF INFO 17:34:16,965 GenomeAnalysisEngine - Strictness is SILENT INFO 17:34:17,076 RMDTrackBuilder - Loading Tribble index from disk for file w01.sorted.vcf

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:242) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:306) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:35) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:225) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:149) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 1.2-29-g0acaf2d):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,820Administrator, GATK Developer admin

    Thanks Paul, we'll look into it.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Posts: 684GATK Developer mod

    Hi Paul,

    You'll need to update to the latest version of the GATK. You are using an extremely old version.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • nodrogluapnodrogluap Posts: 3Member

    Hi,

    I still get a NPE (albeit a different one) with version 2.1-8:

    INFO 08:55:05,825 HelpFormatter - -------------------------------------------------------------------------------- INFO 08:55:05,827 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-8-g5efb575, Compiled 2012/08/30 14:22:17 INFO 08:55:05,827 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 08:55:05,828 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 08:55:05,828 HelpFormatter - Program Args: -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -ped trio_male.ped -R /home/gordonp/ngs_utils/dbs/hg19_revised.fa INFO 08:55:05,828 HelpFormatter - Date/Time: 2012/09/06 08:55:05 INFO 08:55:05,829 HelpFormatter - -------------------------------------------------------------------------------- INFO 08:55:05,829 HelpFormatter - -------------------------------------------------------------------------------- INFO 08:55:05,838 ArgumentTypeDescriptor - Dynamically determined type of w01.sorted.vcf to be VCF INFO 08:55:05,843 GenomeAnalysisEngine - Strictness is SILENT INFO 08:55:05,948 RMDTrackBuilder - Loading Tribble index from disk for file w01.sorted.vcf WARN 08:55:06,005 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer -- descriptions disagree; header has 'Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype' but standard is 'Genotype Quality' WARN 08:55:06,005 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Allele count in genotypes' but standard is 'Allele count in genotypes, for each ALT allele, in the same order as listed' INFO 08:55:06,015 PedReader - Reading PED file trio_male.ped with missing fields: [] INFO 08:55:06,137 PedReader - Phenotype is other? false INFO 08:55:07,709 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:551) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:585) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:799) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:77) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-8-g5efb575):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • ebanksebanks Posts: 684GATK Developer mod

    Ah, I see what's going on. The tool is expecting genotype likelihoods (PLs preferably, but if not then GLs) associated with each sample but your VCF must not have them. I'll patch the tool for the next major release to skip such sites in the future, but I expect that in the meantime you'll need to regenerate your VCF if there are no PLs at all or remove those records that don't have them.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • ebanksebanks Posts: 684GATK Developer mod

    Actually, quick correction: in the next release these sites won't be skipped, but they'll be treated as having flat likelihoods.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • monsunmonsun Posts: 6Member

    Hello,

    I get the same error with GATK version 3.1.1, the vcf is produced by GATK. My call looks like:

    java -Xmx4g -jar modules/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf`
    

    And the error:

    INFO 12:20:04,069 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,071 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21 INFO 12:20:04,071 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 12:20:04,071 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 12:20:04,075 HelpFormatter - Program Args: -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf INFO 12:20:04,079 HelpFormatter - Executing as mans@milou1.uppmax.uu.se on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. INFO 12:20:04,079 HelpFormatter - Date/Time: 2014/06/06 12:20:04 INFO 12:20:04,079 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,080 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,646 GenomeAnalysisEngine - Strictness is SILENT INFO 12:20:04,772 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 12:20:04,862 PedReader - Reading PED file trio.ped with missing fields: [] INFO 12:20:04,949 PedReader - Phenotype is other? false INFO 12:20:05,009 GenomeAnalysisEngine - Preparing for traversal INFO 12:20:05,025 GenomeAnalysisEngine - Done preparing for traversal INFO 12:20:05,025 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 12:20:05,025 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 12:20:15,215 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotype(PhaseByTransmission.java:431) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotypes(PhaseByTransmission.java:389) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:762) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:862) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:126) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    Thanks for help!

    Måns

  • aaronchuaaronchu Posts: 9Member

    Hi there, I met exactly the same problem with PhaseByTransmission in 3.1.1. It generated a truncated vcf containing the first few lines of phased vcf and then stopped wth the same error reported. Could you take a look at this? Thanks!

    aaron

    @monsun said: Hello,

    I get the same error with GATK version 3.1.1, the vcf is produced by GATK. My call looks like:

    java -Xmx4g -jar modules/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf`
    

    And the error:

    INFO 12:20:04,069 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,071 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21 INFO 12:20:04,071 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 12:20:04,071 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 12:20:04,075 HelpFormatter - Program Args: -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf INFO 12:20:04,079 HelpFormatter - Executing as mans@milou1.uppmax.uu.se on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. INFO 12:20:04,079 HelpFormatter - Date/Time: 2014/06/06 12:20:04 INFO 12:20:04,079 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,080 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:20:04,646 GenomeAnalysisEngine - Strictness is SILENT INFO 12:20:04,772 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 12:20:04,862 PedReader - Reading PED file trio.ped with missing fields: [] INFO 12:20:04,949 PedReader - Phenotype is other? false INFO 12:20:05,009 GenomeAnalysisEngine - Preparing for traversal INFO 12:20:05,025 GenomeAnalysisEngine - Done preparing for traversal INFO 12:20:05,025 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 12:20:05,025 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 12:20:15,215 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotype(PhaseByTransmission.java:431) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotypes(PhaseByTransmission.java:389) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:762) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:862) at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:126) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    Thanks for help!

    Måns

  • aaronchuaaronchu Posts: 9Member

    By the way, the input vcf is generated by GenotypeGVCFs (3.1.1). When I sued PhaseByTransmission in 2.8.1 to phase a jointly called trio vcf, it worked without any problem.

    @aaronchu said: Hi there, I met exactly the same problem with PhaseByTransmission in 3.1.1. It generated a truncated vcf containing the first few lines of phased vcf and then stopped wth the same error reported. Could you take a look at this? Thanks!

    aaron

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,820Administrator, GATK Developer admin

    Hi Aaron,

    This sounds like a potential bug. Could you please share a snippet of your file with us so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • aaronchuaaronchu Posts: 9Member

    Hi Geraldine, sorry I just saw your response. Usually I got an email for a response to my question, but this time I didn't. I try to make a snippet for the vcf file (covering the problematic site which stops the job) but when I run the snippet it cannot recapitulate the error. This looks really weird. Do you have an idea or if I can provide a whole vcf file? Thanks!

  • aaronchuaaronchu Posts: 9Member

    I think I can avoid this problem. The problematic trio vcf file was extracted by SelectVariants from a master vcf file jointly called for many samples. Now I just run PhaseByTransmission with --pedigreeValidationType SILENT on the master vcf file (without interruption). After this I can extract the trio vcf from the phased master vcf file. This works despite the longer time wasted to process the master vcf which is much larger. So I guess the problem might be due to PhaseByTransmission's processing of SelectVariants' output?

  • SheilaSheila Broad InstitutePosts: 754Member, GATK Developer, Broadie, Moderator admin

    @aaronchu

    Hi,

    Yes, that sounds reasonable.

    -Sheila

Sign In or Register to comment.