We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

how to build/get populationAlleleFrequencies.vcf for ContEst

wang_yugui2wang_yugui2 china,beijingMember

How to build/get populationAlleleFrequencies.vcf for ContEst ?

and what is the necessary field of VCF file? AF field? MAF field?

I found no guide for this. and I tried 1000G vcf file, but it failed with the following error.

java -jar /usr/hpc-bio/gatk/GATK.jar -T ContEst -l WARN -R /usr/bio-ref/GRCh38.83/GRCh38.dna.fa -pf /usr/bio-ref/GRCh38.83/1000G.vcf -isr INTERSECTION -I:eval /biowrk/bam.bqsr.pair/Project_14686/Sample_100T/bqsr.tumor.bam -I:genotype /biowrk/bam.bqsr.pair/Project_14686/Sample_100T/bqsr.normal.bam -L /usr/bio-ref/GRCh38.83/S04380110_Covered.intervals -o output.txt

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NullPointerException
at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.calcStats(ContEst.java:625)
at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:400)
at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)

Tagged:

Issue · Github
by Sheila

Issue Number
468
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • wisonleewisonlee Member

    Same here, exactly the same issue.
    Please give some hints about the necessary field or tools to convert the 1000 or hapmap vcf.

  • jeleedictjeleedict South KoreaMember
    edited January 2016

    In my case, I used hapmap_3.3.b37.vcf downloaded from the GATK ftp server.
    When I tried in two ways
    1) java -jar GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T ContEst -R human_g1k_v37_decoy.fasta -I:eval tumor.bam -I:genotype normal.bam --popfile hapmap_3.3.b37.vcf -isr INTERSECTION -o ContEst_out.txt
    2) java -jar GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T ContEst -R human_g1k_v37_decoy.fasta -I tumor.bam --genotypes normal_haplotype_called.vcf --popfile hapmap_3.3.b37.vcf -isr INTERSECTION -o ContEst_out.txt

    I got following error messages.

    INFO 19:54:54,790 HelpFormatter - Date/Time: 2016/01/13 19:54:54 INFO 19:54:54,790 HelpFormatter - -------------------------------------------------------------------------------- INFO 19:54:54,790 HelpFormatter - --------------------------------------------------------------------------------
    INFO 19:54:54,845 GenomeAnalysisEngine - Strictness is SILENT
    INFO 19:54:54,955 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 19:54:54,962 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 19:54:55,078 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12
    INFO 19:54:55,254 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files
    INFO 19:54:55,872 GenomeAnalysisEngine - Done preparing for traversal
    INFO 19:54:55,873 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 19:54:55,873 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 19:54:55,873 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 19:54:55,874 ContEst - Running in sequencing mode
    INFO 19:55:02,391 GATKRunReport - Uploaded run statistics report to AWS S3 ##### ERROR ------------------------------------------------------------------------------------------##### ERROR stack trace
    java.lang.NullPointerException
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.calcStats(ContEst.java:625)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:400)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I think the NullPointerException error you're getting is due to the absence of population frequency annotation in the files you're using. This is something we should handle with a better error message though -- I'll see what we can do about that.

    More generally, yes we need to share either the appropriate file or the information for generating one, but we need to sort a few things out internally for that to happen. In the meantime you may be able to use the file we provide for genotype refinement in the bundle, which also contains population frequency information. I'm not sure it'll work but it's worth a try while I hunt down the info on our end. ftp://[email protected]/bundle/2.8/b37/1000G_phase3_v4_20130502.sites.vcf.gz

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We don't have an hg38 version of that file yet, though.

  • samashoorisamashoori GermanyMember

    @Geraldine_VdAuwera said:
    I think the NullPointerException error you're getting is due to the absence of population frequency annotation in the files you're using. This is something we should handle with a better error message though -- I'll see what we can do about that.

    More generally, yes we need to share either the appropriate file or the information for generating one, but we need to sort a few things out internally for that to happen. In the meantime you may be able to use the file we provide for genotype refinement in the bundle, which also contains population frequency information. I'm not sure it'll work but it's worth a try while I hunt down the info on our end. ftp://[email protected]/bundle/2.8/b37/1000G_phase3_v4_20130502.sites.vcf.gz

    I also tried ftp://[email protected]/bundle/2.8/b37/1000G_phase3_v4_20130502.sites.vcf.gz as population allele frequencies input but still get runtime error same as others:

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    As I need to estimate contamination to be used as contamination_fraction_to_filter, do you have any other suggestion than ContEst?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @samashoori
    Hi,

    Can you try using the latest nightly build? I think a fix went in for this recently.

    Thanks,
    Sheila

  • xsolexsole Boston, MAMember

    Hi,

    I was wondering whether a workaround for this was found. I have tried both the stable version and the nightly build 2016-04-07-gf73e637 and always get an error.

    My call:
    java -jar GenomeAnalysisTK.jar -T ContEst \
    -R genome.fa \
    -I:eval HCT116_AKTi.bam \
    -I:genotype HCT116_DMSO.bam \
    --popfile dbsnp_138.hg19.vcf \
    -isr INTERSECTION \
    -o HCT116_output.txt

    The log file:
    INFO 15:22:08,128 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO 15:22:08,130 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2016-04-07-gf73e637, Compiled 2016/04/07 00:01:13
    INFO 15:22:08,130 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 15:22:08,130 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
    INFO 15:22:08,130 HelpFormatter - [Fri Apr 08 15:22:08 EDT 2016] Executing on Linux 2.6.32-279.22.1.el6.x86_64 amd64
    INFO 15:22:08,131 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b26 JdkDeflater
    INFO 15:22:08,134 HelpFormatter - Program Args: -T ContEst -R genome.fa -I:eval HCT116_AKTi.bam -I:genotype HCT116_DMSO.bam --popfile dbsnp_138.hg19.vcf -isr INTERSECTION -o HCT116_output.txt
    INFO 15:22:08,138 HelpFormatter - Executing as [email protected] on Linux 2.6.32-279.22.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_40-b26.
    INFO 15:22:08,138 HelpFormatter - Date/Time: 2016/04/08 15:22:08
    INFO 15:22:08,138 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO 15:22:08,139 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO 15:22:08,582 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:22:08,658 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 15:22:08,664 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 15:22:08,689 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO 15:22:08,892 GenomeAnalysisEngine - Preparing for traversal over 2 BAM files
    INFO 15:22:09,312 GenomeAnalysisEngine - Done preparing for traversal
    INFO 15:22:09,313 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 15:22:09,313 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 15:22:09,313 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 15:22:09,313 ContEst - Running in sequencing mode
    INFO 15:22:10,913 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR --
    ERROR stack trace

    java.lang.RuntimeException: No population frequency annotation for CEU in [VC popfile @ chrM:10874 Q. of type=SNP alleles=[C*, T] attr={ASP=true, HD=true, OTHERKG=true, REF=true, RS=372777740, RSPOS=10874, SAO=0, SSR=0, SYN=true, VC=SNV, VP=0x050000000305000402000100, WGT=1, dbSNPBuildID=138} GT=[]
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.calcStats(ContEst.java:626)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:400)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-04-07-gf73e637):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No population frequency annotation for CEU in [VC popfile @ chrM:10874 Q. of type=SNP alleles=[C*, T] attr={ASP=true, HD=true, OTHERKG=true, REF=true, RS=372777740, RSPOS=10874, SAO=0, SSR=0, SYN=true, VC=SNV, VP=0x050000000305000402000100, WGT=1, dbSNPBuildID=138} GT=[]
    ERROR ------------------------------------------------------------------------------------------

    I have also tried using the file hapmap_3.3.hg19.sites.vcf as the --popfile but I get the very same error. Is this a bug or am I doing something wrong? How could I solve this issue?

    Thanks a lot!!

    Xavi.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @xsole
    Hi Xavi,

    Can you try using the 1000Genomes file for the popfile like Geraldine recommended above?

    Thanks,
    Sheila

  • xsolexsole Boston, MAMember
    edited April 2016

    Hi @Sheila, thanks for your reply. I have done all my analyses with hg19 annotation. I see the file you recommend is in b37 notation. Will it work? Do I have to use precisely the file Geraldine recommends?:

    ftp://[email protected]/bundle/2.8/b37/1000G_phase3_v4_20130502.sites.vcf.gz

    Or could I alternatively use:

    ftp://[email protected]/bundle/2.8/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz

    Thanks again!

    Xavi.

    UPDATE 1:

    I used the file ftp://[email protected]/bundle/2.8/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz and still got a similar error message:

    ERROR stack trace

    java.lang.RuntimeException: No population frequency annotation for CEU in [VC popfile @ chr1:17222748-17222749 Q8342.50 of type=INDEL alleles=[GC*, G] attr={set=MillsAlleleMatch1000G-MillsDoubleCenter} GT=[]
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.calcStats(ContEst.java:626)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:400)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version nightly-2016-04-07-gf73e637):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No population frequency annotation for CEU in [VC popfile @ chr1:17222748-17222749 Q8342.50 of type=INDEL alleles=[GC*, G] attr={set=MillsAlleleMatch1000G-MillsDoubleCenter} GT=[]
    ERROR ------------------------------------------------------------------------------------------

    UPDATE 2:
    When I use the file ftp://[email protected]/bundle/2.8/b37/1000G_phase3_v4_20130502.sites.vcf.gz, I get an error due to the hg19<->b37 discrepancies:

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version nightly-2016-04-07-gf73e637):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Input files /data/rama/labProjects/G0.NGS/genomes/GATK_files/1000G_phase3_v4_20130502.sites.vcf and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information. Error details: No overlapping contigs found.
    ERROR /data/rama/labProjects/G0.NGS/genomes/GATK_files/1000G_phase3_v4_20130502.sites.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
    ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
    ERROR ------------------------------------------------------------------------------------------
    Post edited by xsole on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You'll need to liftover the b37 file to hg19. See Picard tools for liftover.

  • artitandonartitandon Member ✭✭

    I downloaded the 1000G file you recommended and then did the liftover to hg19, and ran the following command:
    java -jar GenomeAnalysisTK.jar -T ContEst -R hg19.fa -I:eval tumor.recal.bam -I:genotype normal.recal.bam --popfile 1000G_phase3_v4_20130502_liftover.sites.vcf -L AGATK.bed -isr INTERSECTION -o test.contest.output.txt&

    This crashed with the following error:
    INFO 13:07:56,440 ContEst - Running in sequencing mode

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.calcStats(ContEst.java:625)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:400)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)

    Any help will be appreciated, Thanks!

  • wisonleewisonlee Member
    edited April 2016

    Thanks for all the instructions and suggestions, I get it work with the following details

    #contEst command
    java \
      -jar /path/to/GenomeAnalysisTK.jar \
      -T ContEst \
      --precision 0.0001 \
      -R /path/to/human_g1k_v37_decoy.fa \
      -I:eval tumor.bam \
      -I:genotype normal.bam \
      --popfile /path/to/hapmap_3.3_nbs.vcf.gz \
      -L /path/to/hapmap_onTarget.intervals \
      -isr INTERSECTION \
      --population ALL \
      -br SAMPLE_ALL_br.txt \
      -o SAMPLE_ALL_PC0.0001.txt
    

    GenomeAnalysisTK.jar: nightly build April 1 2016

    hapmap_onTarget.intervals:

    tabix  /path/to/hapmap_3.3_nbs.vcf.gz -R panel_target_region.bed | tr -s '\t' ':' > hapmap_onTarget.intervals
    

    hapmap_3.3.vcf's format with blank spaces in the INFO column removed:

    ##fileformat=VCFv4.0
    ##AnnotatePopulationAF="analysis_type=AnnotatePopulationAF input_file=[] sample_metadata=[1KG-samples.yaml] read_buffer_size=null phone_home=NO_ET read_filter=[] intervals=null excludeIntervals=null reference_sequence=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta rodBind=[hapmap_3.3.b37.vcf] rodToIntervalTrackName=hapmap BTI_merge_rule=UNION nonDeterministicRandomSeed=false DBSNP=null downsampling_type=null downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 allow_intervals_with_unindexed_bam=false disable_experimental_low_memory_sharding=false logging_level=INFO log_to_file=null help=false preserve_genotypes=true out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub"
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    
    1       566875  rs2185539       C       T       .       PASS    AC=66;AF=0.02369;ALL={C*=0.97629,T=0.02371};AN=2786;ASW={C*=1.00000,T=0.00000};CEU={C*=1.00000,T=0.00000};CHB={C*=1.00000,T=0.00000};CHD={C*=1.00000,T=0.00000};CHS={C*=0.00000,T=0.00000};CLM={C*=0.00000,T=0.00000};FIN={C*=0.00000,T=0.00000};GBR={C*=0.00000,T=0.00000};GIH={C*=1.00000,T=0.00000};IBS={C*=0.00000,T=0.00000};JPT={C*=1.00000,T=0.00000};LWK={C*=1.00000,T=0.00000};MKK={C*=0.82044,T=0.17956};MXL={C*=1.00000,T=0.00000};PUR={C*=0.00000,T=0.00000};TSI={C*=1.00000,T=0.00000};YRI={C*=0.99752,T=0.00248};set=MKK-YRI
    

    Thanks again,
    Richard

  • MichaelGoMichaelGo Member

    @wisonlee said:

    Thanks for all the instructions and suggestions, I get it work with the following details

    Richard, where can this vcf file be found, the one from the resource bundle does not have the population allele frequencies.

    Kind regards,
    Michael

  • KostasMavrommatisKostasMavrommatis San FranciscoMember

    I managed to get it work using the file
    hg19_population_stratified_af_hapmap_3.3.vcf.gz
    which is provided with the data bundle of the original ContEst (https://www.broadinstitute.org/cancer/cga/contest_download)

    Before using the file it needs to be processed in order to remove the spaces from the INFO field and the last GT column and then compressed with bgzip and indexed with tabix.

    Then in my pipeline I run the following steps:

    tabix -h /path/to/hapmap_3.3_nbs.vcf.gz -R panel_target_region.bed > hapmap_onTarget.vcf

    java -Xmx12g -jar GenomeAnalysisTK.jar \
    -T ContEst \
    --precision 0.001 \
    -R genomeDatabase.fa \
    -I:eval inputTumorBAM.bam \
    -I:genotype inputNormalBAM.bam \
    --popfile hg19_population_stratified_af_hapmap_3.3.vcf.gz \
    -L hapmap_onTarget.vcf \
    -L panel_target_region.bed \
    -isr INTERSECTION \
    --population ALL \
    -o outputDirectory}/result.contest.txt

  • bruce01bruce01 Member ✭✭

    I had a few further issues with using the same hapmap file as @KostasMavrommatis (thanks for the advice though!)

    First the -R flag in Tabix is -B (for BED file input).

    But using this then gave the error:

    The provided VCF file is malformed at approximately line number 4: The VCF specification does not allow for whitespace in the INFO field.
    

    So I removed all whitespace and the seemingly useless last column (all "GT"?) using Perl

     perl -F'\t' -ane 'if($F[0]=~m/^#/){
      print$_;
      next;
     }
     chomp;
     pop(@F);
     $F[7]=~s/\s+//g;
     print join("\t",@F[0..$#F]) . "\n";' hg19_population_stratified_af_hapmap_3.3.vcf > hg19_population_stratified_af_hapmap_3.3.FIX.vcf
    

    This then runs for me using the same commands as Kostas above (thanks again=)

  • ym_wangym_wang china liaoningMember

    @KostasMavrommatis @bruce01 I run ContEst without error messages but no results in the output file. Can you detail how to compressed with bgzip and indexed with tabix, and the .gz file ( hg19_population_stratified_af_hapmap_3.3.vcf.gz ) contains .vcf and .vcf.idx?

  • escaonescaon Limoges, FranceMember

    For ContEst, i plan on using "hapmap_3.3.hg38.vcf" (found in GATK bundle hg38) as value for the "--popFile" argument.

    But i am struggling to create the needed "populationSites.interval_list" for the "-L" argument.
    I did read wisonlee & KostasMavrommatis answers to this question, they are both using tabix to generate this "populationSites.interval_list" file, but it requires a file called "panel_target_region.bed", which i have no clue how to get/create.

    Can someone enlighten me ?

    Ps : i also have a "targets.interval_list" for my 409 genes of interest. I am wondering if there will be any conflicts if i give this file as input to ContEst, given that (from ContEst doc), both "populationSites.interval_list" & "targets.interval_list" seems to be values for arguments named "-L".

    Best regards

    Issue · Github
    by Sheila

    Issue Number
    1701
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @escaon,

    The "populationSites.interval_list" is not required for ContEst but just an option. Using a -L intervals list is a good idea though to focus the tool on genomic intervals that you know are in both the stratified population allele frequencies file and your alignments. Otherwise, any discrepancies in contigs in the headers of your input files will cause an error.

  • fortunofortuno ChicagoMember

    Hi,

    I'm also trying to run ContEst for a dataset and got a similar error as previous users:

    ERROR --
    ERROR stack trace

    java.lang.NullPointerException
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:375)
    at org.broadinstitute.gatk.tools.walkers.cancer.contamination.ContEst.map(ContEst.java:127)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version exported):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Code exception (see stack trace for error itself)
    ERROR ------------------------------------------------------------------------------------------

    I am using a liftover version from hg19 to GRCh38 for the population allele frequency file:

    java -jar /usr/local/bin/GenomeAnalysisTK.jar -T ContEst --lane_level_contamination SAMPLE -I:genotype normal.bam -I:eval tumor.bam -R GRCh38.d1.vd1.fa --out test.txt --popfile hg38_population_allele_frequency_hapmap_3.3.vcf.gz --precision 0.1

    In my case, it works for some bam files but it fails with others. Any ideas about what could it be happening? Anything related with my pop file?

    Thanks,
    Francisco

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Welcome back to the forum @fortuno. Perhaps you will find this thread helpful.

Sign In or Register to comment.