NullPointerException in PhaseByTransmission

nodrogluapnodrogluap Posts: 3Member
edited January 2013 in Ask the GATK team

Hi all,

Has anyone else gotten the following:

java.lang.NullPointerException
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:242)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:306)
at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:35)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:225)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:149)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

My command line was:
java -jar GenomeAnalysisTK.jar -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -f "mom+dad=child" -R hg19.fa

Cheers,

Paul

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    Hi Paul, can you specify which version of GATK you are using? Ideally we would prefer to have the full stack trace, not just a cutout of the error message.

    Geraldine Van der Auwera, PhD

  • nodrogluapnodrogluap Posts: 3Member

    Hi Geraldine,

    The version is v1.2-29-g0acaf2d.

    Here the full output...

    INFO 17:34:16,946 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:34:16,949 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.2-29-g0acaf2d, Compiled 2011/09/29 00:21:43
    INFO 17:34:16,949 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 17:34:16,950 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 17:34:16,950 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 17:34:16,950 HelpFormatter - Program Args: -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -f dad+mom=child -R /home/gordonp/ngs_utils/dbs/hg19_revised.fa
    INFO 17:34:16,950 HelpFormatter - Date/Time: 2012/09/05 17:34:16
    INFO 17:34:16,951 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:34:16,951 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:34:16,959 RodBindingArgumentTypeDescriptor - Dynamically determined type of w01.sorted.vcf to be VCF
    INFO 17:34:16,965 GenomeAnalysisEngine - Strictness is SILENT
    INFO 17:34:17,076 RMDTrackBuilder - Loading Tribble index from disk for file w01.sorted.vcf

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:242)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:306)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:35)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:225)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:149)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 1.2-29-g0acaf2d):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    Thanks Paul, we'll look into it.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin

    Hi Paul,

    You'll need to update to the latest version of the GATK. You are using an extremely old version.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • nodrogluapnodrogluap Posts: 3Member

    Hi,

    I still get a NPE (albeit a different one) with version 2.1-8:

    INFO 08:55:05,825 HelpFormatter - --------------------------------------------------------------------------------
    INFO 08:55:05,827 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-8-g5efb575, Compiled 2012/08/30 14:22:17
    INFO 08:55:05,827 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 08:55:05,828 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 08:55:05,828 HelpFormatter - Program Args: -T PhaseByTransmission -V w01.sorted.vcf -o w01.phased.vcf -ped trio_male.ped -R /home/gordonp/ngs_utils/dbs/hg19_revised.fa
    INFO 08:55:05,828 HelpFormatter - Date/Time: 2012/09/06 08:55:05
    INFO 08:55:05,829 HelpFormatter - --------------------------------------------------------------------------------
    INFO 08:55:05,829 HelpFormatter - --------------------------------------------------------------------------------
    INFO 08:55:05,838 ArgumentTypeDescriptor - Dynamically determined type of w01.sorted.vcf to be VCF
    INFO 08:55:05,843 GenomeAnalysisEngine - Strictness is SILENT
    INFO 08:55:05,948 RMDTrackBuilder - Loading Tribble index from disk for file w01.sorted.vcf
    WARN 08:55:06,005 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer -- descriptions disagree; header has 'Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype' but standard is 'Genotype Quality'
    WARN 08:55:06,005 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Allele count in genotypes' but standard is 'Allele count in genotypes, for each ALT allele, in the same order as listed'
    INFO 08:55:06,015 PedReader - Reading PED file trio_male.ped with missing fields: []
    INFO 08:55:06,137 PedReader - Phenotype is other? false
    INFO 08:55:07,709 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:551)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:585)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:799)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:77)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-8-g5efb575):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin

    Ah, I see what's going on. The tool is expecting genotype likelihoods (PLs preferably, but if not then GLs) associated with each sample but your VCF must not have them. I'll patch the tool for the next major release to skip such sites in the future, but I expect that in the meantime you'll need to regenerate your VCF if there are no PLs at all or remove those records that don't have them.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • ebanksebanks Broad InstitutePosts: 689Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin

    Actually, quick correction: in the next release these sites won't be skipped, but they'll be treated as having flat likelihoods.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • monsunmonsun Posts: 6Member

    Hello,

    I get the same error with GATK version 3.1.1, the vcf is produced by GATK. My call looks like:

    java -Xmx4g -jar modules/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf`
    

    And the error:

    INFO 12:20:04,069 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,071 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
    INFO 12:20:04,071 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 12:20:04,071 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 12:20:04,075 HelpFormatter - Program Args: -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf
    INFO 12:20:04,079 HelpFormatter - Executing as mans@milou1.uppmax.uu.se on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
    INFO 12:20:04,079 HelpFormatter - Date/Time: 2014/06/06 12:20:04
    INFO 12:20:04,079 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,080 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,646 GenomeAnalysisEngine - Strictness is SILENT
    INFO 12:20:04,772 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 12:20:04,862 PedReader - Reading PED file trio.ped with missing fields: []
    INFO 12:20:04,949 PedReader - Phenotype is other? false
    INFO 12:20:05,009 GenomeAnalysisEngine - Preparing for traversal
    INFO 12:20:05,025 GenomeAnalysisEngine - Done preparing for traversal
    INFO 12:20:05,025 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 12:20:05,025 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 12:20:15,215 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotype(PhaseByTransmission.java:431)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotypes(PhaseByTransmission.java:389)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:762)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:862)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:126)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    Thanks for help!

    Måns

  • aaronchuaaronchu Posts: 9Member

    Hi there, I met exactly the same problem with PhaseByTransmission in 3.1.1. It generated a truncated vcf containing the first few lines of phased vcf and then stopped wth the same error reported. Could you take a look at this? Thanks!

    aaron

    @monsun said:
    Hello,

    I get the same error with GATK version 3.1.1, the vcf is produced by GATK. My call looks like:

    java -Xmx4g -jar modules/GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf`
    

    And the error:

    INFO 12:20:04,069 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,071 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:21
    INFO 12:20:04,071 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 12:20:04,071 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 12:20:04,075 HelpFormatter - Program Args: -T PhaseByTransmission -R GRCh37.fasta -V trio.vcf -ped trio.ped -o trio.phtr.vcf
    INFO 12:20:04,079 HelpFormatter - Executing as mans@milou1.uppmax.uu.se on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
    INFO 12:20:04,079 HelpFormatter - Date/Time: 2014/06/06 12:20:04
    INFO 12:20:04,079 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,080 HelpFormatter - --------------------------------------------------------------------------------
    INFO 12:20:04,646 GenomeAnalysisEngine - Strictness is SILENT
    INFO 12:20:04,772 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 12:20:04,862 PedReader - Reading PED file trio.ped with missing fields: []
    INFO 12:20:04,949 PedReader - Phenotype is other? false
    INFO 12:20:05,009 GenomeAnalysisEngine - Preparing for traversal
    INFO 12:20:05,025 GenomeAnalysisEngine - Done preparing for traversal
    INFO 12:20:05,025 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 12:20:05,025 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 12:20:15,215 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotype(PhaseByTransmission.java:431)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotypes(PhaseByTransmission.java:389)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:762)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:862)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:126)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    Thanks for help!

    Måns

  • aaronchuaaronchu Posts: 9Member

    By the way, the input vcf is generated by GenotypeGVCFs (3.1.1). When I sued PhaseByTransmission in 2.8.1 to phase a jointly called trio vcf, it worked without any problem.

    @aaronchu said:
    Hi there, I met exactly the same problem with PhaseByTransmission in 3.1.1. It generated a truncated vcf containing the first few lines of phased vcf and then stopped wth the same error reported. Could you take a look at this? Thanks!

    aaron

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,271Administrator, GATK Dev admin

    Hi Aaron,

    This sounds like a potential bug. Could you please share a snippet of your file with us so we can debug this locally? Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • aaronchuaaronchu Posts: 9Member

    Hi Geraldine, sorry I just saw your response. Usually I got an email for a response to my question, but this time I didn't. I try to make a snippet for the vcf file (covering the problematic site which stops the job) but when I run the snippet it cannot recapitulate the error. This looks really weird. Do you have an idea or if I can provide a whole vcf file? Thanks!

  • aaronchuaaronchu Posts: 9Member

    I think I can avoid this problem. The problematic trio vcf file was extracted by SelectVariants from a master vcf file jointly called for many samples. Now I just run PhaseByTransmission with --pedigreeValidationType SILENT on the master vcf file (without interruption). After this I can extract the trio vcf from the phased master vcf file. This works despite the longer time wasted to process the master vcf which is much larger. So I guess the problem might be due to PhaseByTransmission's processing of SelectVariants' output?

  • SheilaSheila Broad InstitutePosts: 1,928Member, GATK Dev, Broadie, Moderator, DSDE Dev admin

    @aaronchu

    Hi,

    Yes, that sounds reasonable.

    -Sheila

  • andrewoandrewo Posts: 10Member

    Hi,

    I have a similar problem with PhaseByTransmission in GATK 3.4. Here's the error message I see:

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.NullPointerException
        at org.broadinstitute.gatk.tools.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotype(PhaseByTransmission.java:445)
        at org.broadinstitute.gatk.tools.walkers.phasing.PhaseByTransmission$TrioPhase.getPhasedGenotypes(PhaseByTransmission.java:403)
        at org.broadinstitute.gatk.tools.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:776)
        at org.broadinstitute.gatk.tools.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:876)
        at org.broadinstitute.gatk.tools.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:141)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.4-0-g7e26428):
    ##### ERROR
    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
    

    I also ran ValidateVariants as suggested here, which gave no errors.

    I made a subset vcf file and ran with PBT through a Queue script with 20 scatters and after several stages, I narrowed down the error to this variant:

    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  1781NIH 1782NIH 1783NIH
    21  20525547    .   TAC T   1560.74 PASS    AC=0;AF=0.00;AN=8;BaseQRankSum=0.592;CCC=7538;CSQ=-||||intergenic_variant|||||||1||||||||||||||||||||||||||||||||||||||||||||;ClippingRankSum=0.978;DP=100;FS=0.000;GQ_MEAN=37.75;GQ_STDDEV=25.60;HWP=0.1201;InbreedingCoeff=0.5204;MQ=60.04;MQ0=0;MQRankSum=0.401;NCC=24;QD=6.24;ReadPosRankSum=-5.780e-01;VQSLOD=0.554;culprit=FS GT:AD:DP:GQ:PL  0/0:0,40:40:0   0/0:21,0:21:64:0,64,825 0/0:0,24:24:0
    

    Interestingly, it only produces the error about half of the time. The other half of the time it completes successfully. It's really hit or miss -- sometimes it fails many times in a row.

    I noticed that the FORMAT column has 5 fields but some of the data columns only have 4 columns. I checked to see whether this was common and it turns out this is the only line in this file that has this pattern (forgive the one-liner):

    zcat test.vcf.gz | grep -v "^#" | perl -ane'my $count = scalar(split(/:/, $F[8])); print "$count"; foreach my $gt (@F[9..11]){my $f = scalar(split(/:/, $gt)); print "\t$f"; } print "\n";' | sort | uniq -c
         75 3   3   3   3
          5 5   3   3   3
         21 5   3   3   5
         26 5   3   5   3
         42 5   3   5   5
          1 5   4   5   4    *** This one
         40 5   5   3   3
        124 5   5   3   5
        140 5   5   5   3
          2 5   5   5   4
      23234 5   5   5   5
    

    I'm not sure if that is significant or not, but it may have something to do with it. If I put this single variant in a file by itself (with the header of the original file), it also reproduces the error message, even when I run PBT by itself (outside of the Scala script), e.g.,

    java -Xmx3G -jar /usr/local/bio_apps/gatk-3.4/GenomeAnalysisTK.jar '-T' 'PhaseByTransmission'  '-L' 'chr21.fail5.interval_list'  '-R' '/path/human_g1k_v37.fasta'  '-ped' '1781NIH.ped'  '-pedValidationType' 'SILENT'  '-V' 'test.vcf'  '-prior' '1.0E-4'  '-o' test.pbt.vcf.gz
    

    Contents of chr21.fail5.interval_list:

    21:20525545-20525556
    

    Contents of 1781NIH.ped:

    #family_id  individual_id   paternal_id maternal_id sex phenotype
    1   1781NIH 1782NIH 1783NIH 1   2
    1   1782NIH 0   0   1   1
    1   1783NIH 0   0   2   1
    

    This variant also has all 0/0 genotypes, but that is not unique to this variant -- there are 17895 variants (of 23710 total) with 0/0 for all individuals because this VCF file is subset from a VCF with many more individuals in it.

    I get the same error if I remove the -prior option from the command, so I think that is not related either.

    Any ideas on what is going on here?

    Thanks,

    Andrew

  • SheilaSheila Broad InstitutePosts: 1,928Member, GATK Dev, Broadie, Moderator, DSDE Dev admin

    @andrewo
    Hi Andrew,

    Can you confirm the VCF was generated with the latest version of GATK? For the two samples with missing PLs, it looks like the AD field is suggesting there should be a variant, but the site is called hom-ref.

    Thanks,
    Sheila

Sign In or Register to comment.