MuTect2 beta --germline_resource for build b37

Jeff_GaitherJeff_Gaither Math Biosciences Institute, Columbus, OHMember

Hi - I'm looking to run MuTect2 beta using the --germline_resource option. However, I've consistently used the b37 genome build throughout my analaysis, while the suggested resource (gnomad) appears to only be available for the hg19 build. So I'm wondering whether I should
1. Go ahead and use the gnomad hg19 files, despite the fact that my whole analysis has used the b37 build?
2. Lift over my existing gnomad vcfs from hg19 to b37? (In this case, I'd need an hg19tob37 liftOver file - I can't find one anywhere).
3. Use another germline resource?

I wonder which option you would recommend? Many thanks for your time.

Best Answer

Answers

  • Jeff_GaitherJeff_Gaither Math Biosciences Institute, Columbus, OHMember
    edited July 2017

    Okay, I found an hg19 -> b37 liftover chain file at
    ftp://[email protected]/Liftover_Chain_Files

    However, when I tried to lift the gnomad vcf files from hg19 to b37 using Picard's LiftoverVcfs, every single site was rejected as unliftable. So this didn't work.

    Post edited by Jeff_Gaither on
  • Jeff_GaitherJeff_Gaither Math Biosciences Institute, Columbus, OHMember
    Accepted Answer

    Hey, I just found the needed file in the Resource bundle...
    ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz
    So, problem solved!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Jeff_Gaither
    Hi,

    Glad you solved it on your own!

    -Sheila

  • Hi, could you give me af-only-gnomd.vcf for the hg19 build,I don't know how to find this vcf.
    Thanks~

  • KHL0702KHL0702 Member
    edited August 2017

    Hi, I would like to know where I can download the file af-only-gnomd.vcf for the hg38 build,the Resource bundle you given is only for b37,
    I hope to get some advice from you,thanks for your help!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @luoshizhi @KHL0702
    Hi,

    Have a look at this thread. It is the same for hg38 too.

    -Sheila

  • Binfls6321DTUBinfls6321DTU Member
    edited October 2017

    @Sheila
    Hi,
    I'm new in using MuTect2 :p
    I'm a bit confused that the gnomAD file provided on gnomad.broadinstitute.org/downloads is about three times larger than the gnomAD file provided on ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz ?
    is there any difference between these two files ? could the difference affect the output of MuTect2??

    Thx in advance :)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Binfls6321DTU
    Hi,

    Sorry for the confusion. That file is what we use in the Mutect2 tutorial. It only contains variants from one chromosome. You should use the entire gnomAD VCF if you are running on all chromosomes. I will make a note to make it more clear in the documentation for GATK4.

    Also, if you are new, have a look at the Mutect2 tutorials in the GATK presentations section. I would recommend running through the tutorial from July for a basic introduction to the workflow, then trying out the September version for a more in-depth look :smiley:

    -Sheila

  • Binfls6321DTUBinfls6321DTU Member
    edited October 2017

    @Sheila Thank you very much, :D I have tried the gnomAD file and succeeded. Note that the some AF values in gnomAD vcf file are only a dot, e.g. ".", and that will make the program crash.

    Another problem I get now is, it always reports the "segmentation fault" when I use the samtools-1.5/misc/seq_cache_populate.pl

    /var/spool/torque/mom_priv/jobs/18460427.risoe-r04-sn064.cm.cluster.SC: line 20: 13995 Segmentation fault /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/apps/src/samtools-1.3.1/samtools view -C -T /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/databases/gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-GatherBamFiles_tumor/execution/TCRBOA3-T.bam
    13996 Done | tee TCRBOA3-T.cram
    13997 Done | md5sum
    13998 Done | awk '{print $1}' > TCRBOA3-T.cram.md5

    below is the command

    /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/apps/src/samtools-1.3.1/samtools view -C -T /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/databases/gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-GatherBamFiles_tumor/execution/TCRBOA3-T.bam | \
    tee TCRBOA3-T.cram | \
    md5sum | awk '{print $1}' > TCRBOA3-T.cram.md5
    #Create REF_CACHE. Used when indexing a CRAM
    /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/apps/src/samtools-1.5/misc/seq_cache_populate.pl -root ./ref/cache /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/databases/gatk-legacy-bundles/b37/human_g1k_v37_decoy.fasta
    export REF_PATH=:
    export REF_CACHE=./ref/cache/%2s/%2s/%s
    /home/projects/cu_10098/data/ennbin/project/hg37_test/origin/copy/cromwell-executions/WES_normal_tumor_somatic_SNV_wf/aef23f13-da39-4c48-bc7f-1fe70c87d06a/call-ConvertToCram_tumor/inputs/home/projects/cu_10098/apps/src/samtools-1.3.1/samtools index TCRBOA3-T.cram
    mv TCRBOA3-T.cram.crai TCRBOA3-T.crai

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Binfls6321DTU
    Hi,

    Note that the some AF values in gnomAD vcf file are only a dot, e.g. ".", and that will make the program crash.

    What did you do to get around that? Can you post the exact error message?

    For Samtools, I would recommend asking on their maliling list, as we don't help with their tools. The only thing I can recommend is trying the very latest version of their tools :smile:

    -Sheila

  • Hi,
    I'm having the same problem with dots in the gnomad file when passing it through the germline_resource option. Here is the error message:

    java.lang.NumberFormatException: For input string: "."
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)
    at java.lang.Double.valueOf(Double.java:502)
    at htsjdk.variant.variantcontext.CommonInfo.lambda$getAttributeAsDoubleList$2(CommonInfo.java:299)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Collections$2.tryAdvance(Collections.java:4717)
    at java.util.Collections$2.forEachRemaining(Collections.java:4725)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsList(CommonInfo.java:273)
    at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsDoubleList(CommonInfo.java:293)
    at htsjdk.variant.variantcontext.VariantContext.getAttributeAsDoubleList(VariantContext.java:740)
    at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.isActive(Mutect2Engine.java:312)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.loadNextAssemblyRegion(AssemblyRegionIterator.java:158)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:129)
    at org.broadinstitute.hellbender.engine.AssemblyRegionIterator.next(AssemblyRegionIterator.java:32)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:250)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:231)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:838)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
    at org.broadinstitute.hellbender.Main.main(Main.java:233)

    Many thanks in advance for your help,

    Olivier

    Issue · Github
    by shlee

    Issue Number
    2654
    State
    closed
    Last Updated
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @olisand
    Hi Olivier,

    Can you check if this issue still occurs in GATK4 latest beta? Also, if the VCF validates with ValidateVariants and still errors in GATK4, can you submit a bug report?

    Thanks,
    Sheila

  • Hi Sheila,

    Thank you for your answer.
    It is still occurring with beta.6 and the VCF validates. I'll submit a bug report today.

    Olivier

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited November 2017

    Hi Olivier (@olisand),

    I am following up on behalf of the team who are away for a workshop. Let me know when you submit your bug report.

    Just to be clear I understand, the gnomad file that is causing the error:

    java.lang.NumberFormatException: For input string: "."
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

    is from ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz? Or does it come from a different source?

    Thanks.

  • Hi (@shlee),

    I got it from there: https://storage.googleapis.com/gnomad-public/release/2.0.2/vcf/exomes/gnomad.exomes.r2.0.2.sites.vcf.bgz
    and added 'chr' before chromosome numbers in it.
    I just tried with the one from the bundle (after adding 'chr' to each chromosome number) and it worked without error. So, I guess case is solved ?

    Thank you for your help,

    Olivier

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @olisand,

    The URL you give isn't one of ours so I cannot comment. I'm glad you've solved your issue.

    The resources we guarantee work with our tools come from our Resource Bundle at https://software.broadinstitute.org/gatk/download/bundle and have already been formatted for either GRCh37 (sans chr) or hg19 (with chr) contig naming conventions. I recommend you use the files we provide for the convention that you need instead of converting between the two. The concern is there are some coordinate differences for the mitochondria. You can also use Picard LiftoverVcf with CHAIN files as well.

  • Hi @shlee,

    Ok, thanks again for your help.

  • kkshaxqdkkshaxqd chinaMember

    Hello @olisand, I download the gnomad.exomes.r2.0.2.sites.vcf.bgz ,but can not deal with the bgz file, so I want to know how to add chr in the gnomad.exomes.r2.0.2.sites.vcf.bgz, and let the mutect2 use the file....

    Thanks.

    JOSH ZHANG

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @kkshaxqd,

    We provide a simplified gnomad resource file (for GRCh37) in the FTP Bundle, in beta>Mutect2.

  • kkshaxqdkkshaxqd chinaMember

    Hello, @shlee , thanks very much , en....That's right, I think it will work and resolve the problem. GRCh37/hg19.....

  • @olisand said:
    Hi (@shlee),

    I got it from there: https://storage.googleapis.com/gnomad-public/release/2.0.2/vcf/exomes/gnomad.exomes.r2.0.2.sites.vcf.bgz
    and added 'chr' before chromosome numbers in it.
    I just tried with the one from the bundle (after adding 'chr' to each chromosome number) and it worked without error. So, I guess case is solved ?

    Thank you for your help,

    Olivier

    Hi,Olisand!
    I have run into the same question as yours, and i wonder how did you make it that "add 'chr' before chromosome numbers in it"? and would you like to share the experience with me? thanks a lot!

  • huseehusee Member

    I can not open the site: ftp://[email protected]/Liftover_Chain_Files,where can I get a chain file to lift small_exac_common_3_b37.vcf.gz to hg19?

  • shleeshlee CambridgeMember, Broadie, Moderator admin
  • huseehusee Member

    Thanks ,@shlee ,I can open the ftp://[email protected]/Liftover_Chain_Files suddenly and download the b37tohg19.chain file.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Great to hear you can now access the FTP folder @husee.

  • pwaltmanpwaltman New York, NYMember

    @Jeff_Gaither said:
    Hey, I just found the needed file in the Resource bundle...
    ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz
    So, problem solved!

    This link no longer works. I found copies available here: http://bioinfo5pilm46.mit.edu/software/GATK/resources/

    Although I'm not sure if these are the same files that were originally on the broad's ftp site.

  • pwaltmanpwaltman New York, NYMember

    @pwaltman said:

    @Jeff_Gaither said:
    Hey, I just found the needed file in the Resource bundle...
    ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz
    So, problem solved!

    This link no longer works. I found copies available here: http://bioinfo5pilm46.mit.edu/software/GATK/resources/

    Although I'm not sure if these are the same files that were originally on the broad's ftp site.

    I also found that you can download the b37 file from: https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37/

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Thanks for the update @pwaltman

  • pwaltmanpwaltman New York, NYMember
    edited December 2018

    I'm now getting a really odd error where I get a message that a given line in the af-only-gnomad.raw.sites.vcf for genome b37 only has 1 token. The exact error message is (and I give the full output below that):

    htsjdk.tribble.TribbleException: Line 16805:there aren't enough columns for line (we expected 9 tokens, and saw 1 ), for input source: file:///ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37/af-only-gnomad.raw.sites.vcf

    Note, I aligned my bams with bwa-aln. Yes, I'm aware that bwa-mem is faster, but upon advice that aln is more accurate for somatic tissues, that is what was used - although, admittedly, that advice is based on the results from earlier GATK versions. Anyways, if bwa-aln is the issue, I can re-align with BWA-mem, if necessary.

    Also, this seems to be sample specific. I get a similar error for another one of my paired tumor-normal samples (same error, but different line). However, I don't get it for a 3rd paired sample that I have. Not sure why that would be the case. Perhaps, it stems from different mutations being called in the samples, i.e. for the problem samples, a putative variant is called that in in the af-only-gnomad.raw.sites.vcf , but incorrectly formatted (?). However, I checked the reported line number, and it appeared to be formatted correctly.

    Finally, I should add that I also aligned this paired sample to hg38, and am running gatk4 Mutect2 on that, and it seems to be working without issue. Weird.

    I'm calling Mutect2 using the following script:

    $ cat run_gatk4_b37_mutect2.sh
    #!/bin/bash -l
    if [ -z $BINDIR ]; then
      BINDIR=$HOME/bin
    fi
    
    gatk_resource_dir=/ifs/scratch/c2b2/ac_lab/shares/resources/Homo_sapiens/GATK/b37
    mutect2_resource_dir=/ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37
    outdir=/data/PROJECTS/ruping_RNET_genomics/private_data/analysis/mutect2/ac580_v_ac581
    
    $BINDIR/GATK/gatk-4.0.11.0/gatk --java-options "-Xmx30g" Mutect2 \
      -R $gatk_resource_dir/human_g1k_v37_decoy.fasta \
      -I $outdir/ac580/AC580_merged.b37.aln.sorted.rmDup.md.recal_1st_pass.recal.bam \
      -tumor AC580 \
      -I $outdir/ac581/AC581_merged.b37.aln.sorted.rmDup.md.recal_1st_pass.recal.bam \
      -normal AC581 \
      --germline-resource $mutect2_resource_dir/af-only-gnomad.raw.sites.vcf \
      -O $outdir/ac580_v_ac581_b37_gatk4_m2_snvs_indels.vcf.gz
    

    Anyway, the full ouput is:
    $ ./run_gatk4_b37_mutect2.sh Using GATK jar /home/local/ARCS/pw2470/bin/GATK/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx30g -jar /home/local/ARCS/pw2470/bin/GATK/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar Mutect 2 -R /ifs/scratch/c2b2/ac_lab/shares/resources/Homo_sapiens/GATK/b37/human_g1k_v37_decoy.fasta -I /data/PROJECTS/ruping_RNET_genomics/private_data/analysis/mutect2/ac580_v_ac581/ac580/AC580_merged.b37.aln.sorted.rmDup.md.recal_1st_pass.recal.bam -tumor AC580 -I /data/PROJECTS/ruping_RNET_genomics/private_data/analysis/mutect2/ac580_v_ac581/ac581/AC581_merged.b37.aln.sorted.rmDup.md.recal_1st_pass.recal.bam -normal AC581 --germline-resource /ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37/af-only-gnomad.raw.sites.vcf -O /data/PROJECTS/ruping_RNET_genomics/private_data/analysis/mutect2/ac580_v_ac581/ac580_v_ac581_b37_gatk4_m2_snvs_indels.vcf.gz 16:18:12.959 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/local/ARCS/pw2470/bin/GATK/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 16:18:14.579 INFO Mutect2 - ------------------------------------------------------------ 16:18:14.579 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.0.11.0 16:18:14.579 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/ 16:18:14.579 INFO Mutect2 - Executing as [email protected] on Linux v4.15.0-42-generic amd64 16:18:14.580 INFO Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_191-b12 16:18:14.580 INFO Mutect2 - Start Date/Time: December 14, 2018 4:18:12 PM EST 16:18:14.580 INFO Mutect2 - ------------------------------------------------------------ 16:18:14.580 INFO Mutect2 - ------------------------------------------------------------ 16:18:14.581 INFO Mutect2 - HTSJDK Version: 2.16.1 16:18:14.581 INFO Mutect2 - Picard Version: 2.18.13 16:18:14.581 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 16:18:14.581 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 16:18:14.582 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 16:18:14.582 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 16:18:14.582 INFO Mutect2 - Deflater: IntelDeflater 16:18:14.582 INFO Mutect2 - Inflater: IntelInflater 16:18:14.582 INFO Mutect2 - GCS max retries/reopens: 20 16:18:14.582 INFO Mutect2 - Requester pays: disabled 16:18:14.582 INFO Mutect2 - Initializing engine 16:18:15.076 INFO FeatureManager - Using codec VCFCodec to read file file:///ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37/af-only-gnomad.raw.sites.vcf 16:18:16.768 WARN IndexUtils - Feature file "/ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37/af-only-gnomad.raw.sites.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file 16:18:17.877 INFO Mutect2 - Done initializing engine 16:18:17.901 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/local/ARCS/pw2470/bin/GATK/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 16:18:17.908 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/local/ARCS/pw2470/bin/GATK/gatk-4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so 16:18:17.939 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions 16:18:17.939 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM 16:18:17.939 INFO IntelPairHmm - Available threads: 16 16:18:17.939 INFO IntelPairHmm - Requested threads: 4 16:18:17.939 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation 16:18:17.966 INFO ProgressMeter - Starting traversal 16:18:17.967 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute 16:18:28.011 INFO ProgressMeter - 1:798450 0.2 3720 22222.2 16:18:38.016 INFO ProgressMeter - 1:1174972 0.3 6610 19781.5 16:18:48.040 INFO ProgressMeter - 1:1600486 0.5 9880 19712.0 16:18:58.049 INFO ProgressMeter - 1:2052325 0.7 13460 20148.7 16:19:08.108 INFO ProgressMeter - 1:2483943 0.8 16890 20211.0 16:19:18.114 INFO ProgressMeter - 1:2926606 1.0 20040 19991.0 16:19:28.119 INFO ProgressMeter - 1:3272218 1.2 22880 19568.9 16:19:38.146 INFO ProgressMeter - 1:3700227 1.3 26290 19673.5 16:19:48.202 INFO ProgressMeter - 1:4189183 1.5 29560 19655.6 16:19:58.252 INFO ProgressMeter - 1:4508121 1.7 32260 19301.0 16:20:08.278 INFO ProgressMeter - 1:4858629 1.8 35180 19135.0 16:20:18.282 INFO ProgressMeter - 1:5192122 2.0 38000 18950.3 16:20:28.288 INFO ProgressMeter - 1:5497137 2.2 40600 18692.3 16:20:38.298 INFO ProgressMeter - 1:5798956 2.3 43210 18474.9 16:20:48.325 INFO ProgressMeter - 1:6179628 2.5 46370 18503.8 16:20:58.333 INFO ProgressMeter - 1:6588278 2.7 49590 18553.8 16:21:08.155 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.087292907 16:21:08.156 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 3.8381889750000004 16:21:08.156 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 29.00 sec 16:21:08.157 INFO Mutect2 - Shutting down engine [December 14, 2018 4:21:08 PM EST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 2.92 minutes. Runtime.totalMemory()=1015545856 htsjdk.tribble.TribbleException: Line 16805:there aren't enough columns for line (we expected 9 tokens, and saw 1 ), for input source: file:///ifs/scratch/c2b2/ac_lab/shares/resources/TOOL_SPECIFIC/Mutect2/b37/af-only-gnomad.raw.sites.vcf at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:296) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:277) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37) at htsjdk.tribble.TribbleIndexedFeatureReader$QueryIterator.readNextRecord(TribbleIndexedFeatureReader.java:486) at htsjdk.tribble.TribbleIndexedFeatureReader$QueryIterator.<init>(TribbleIndexedFeatureReader.java:426) at htsjdk.tribble.TribbleIndexedFeatureReader.query(TribbleIndexedFeatureReader.java:297) at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:528) at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:497) at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:334) at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:163) at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:115) at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:230) at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:155) at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:221) at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:230) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:291) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:267) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)

  • pwaltmanpwaltman New York, NYMember
    edited December 2018

    apologies for the duplicate post. my original comment didn't show up the first time I posted it, and the forum doesn't allow members to delete their posts/comments.

  • manbamanba Member ✭✭

    @pwaltman said:

    @pwaltman said:

    @Jeff_Gaither said:
    Hey, I just found the needed file in the Resource bundle...
    ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz
    So, problem solved!

    This link no longer works. I found copies available here: http://bioinfo5pilm46.mit.edu/software/GATK/resources/

    Although I'm not sure if these are the same files that were originally on the broad's ftp site.

    I also found that you can download the b37 file from: https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37/

    thanks a lot

  • pwaltmanpwaltman New York, NYMember

    Not sure what the problem is/was withe the af-only-gnomad.raw.sites.vcf file that I downloaded, but I ended up creating my own version by lifting over the hg38 version to b37, which is a 2-part process because you have to lift the hg38 version to hg19, and then again to b37. You can find the chain files on the UCSC genome browser site (you want the hg38Tohg19 chain file, as well as the hg19toGRCh37 chain file. Also, after trying Crossmap.py, I gave up on it, and used the LiftoverVcf tool from Picard (which is now part of GATK4).

    While the version I generated is somewhat smaller (13.1G from the resource; 12.9G for the one I generated), I no longer get the errors that I was getting previously.

  • manbamanba Member ✭✭

    @pwaltman said:

    @pwaltman said:

    @Jeff_Gaither said:
    Hey, I just found the needed file in the Resource bundle...
    ftp://[email protected]/bundle/beta/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz
    So, problem solved!

    This link no longer works. I found copies available here: http://bioinfo5pilm46.mit.edu/software/GATK/resources/

    Although I'm not sure if these are the same files that were originally on the broad's ftp site.

    I also found that you can download the b37 file from: https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37/

    thanks a lot

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @pwaltman When you use the b37 gnomad resource file from the google bucket do you still get the error?

    Also, the advice to use BWA aln for somatic samples only applies for reads that are relatively short by today's standards. If your reads are longer than 70bp you should use BWA mem.

Sign In or Register to comment.