Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

M2 and GDBI for PON: [E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at chr1:16949

manolismanolis Member ✭✭

GATK 4.1.1.0, local linux server

Hi,

I ran some WES normal samples:

${gatk} Mutect2 \
-R ${hg38} \
-I "${sample}.bam" \ 
-O "${sample}.vcf.gz" \
-L ${interval} \
-ip 5 \
--max-mnp-distance 0

and then GenomicsDBImport:

${gatk} GenomicsDBImport \
-R ${hg38} \
-V "${sample1}.vcf.gz" \
-V "${sample2}.vcf.gz" \
--batch-size 1 --reader-threads 1 \
--genomicsdb-workspace-path "GDBI_pon" \
-L chr1

Here the error:

13:18:45.329 INFO  GenomicsDBImport - Done initializing engine
13:18:45.517 INFO  GenomicsDBImport - Vid Map JSON file will be written to /home/manolis/prove/GDBI_pon/GDBI_pon/vidmap.json
13:18:45.517 INFO  GenomicsDBImport - Callset Map JSON file will be written to /home/manolis/prove/GDBI_pon/GDBI_pon/callset.json
13:18:45.517 INFO  GenomicsDBImport - Complete VCF Header will be written to /home/manolis/prove/GDBI_pon/GDBI_pon/vcfheader.vcf
13:18:45.517 INFO  GenomicsDBImport - Importing to array - /home/manolis/prove/GDBI_pon/GDBI_pon/genomicsdb_array
13:18:45.517 INFO  ProgressMeter - Starting traversal
13:18:45.517 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
13:18:45.820 INFO  GenomicsDBImport - Importing batch 1 with 1 samples
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at chr1:14653
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe721be816b, pid=12942, tid=0x00007fe7801f7700
#
# JRE version: OpenJDK Runtime Environment (8.0_152-b12) (build 1.8.0_152-release-1056-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.152-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtiledbgenomicsdb8166440819035845683.so+0x35416b]  bcf_unpack+0x36b
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/manolis/prove/GDBI_pon/hs_err_pid12942.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

Here the header of the vcf.gz and the variant:

##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">

chr1    14653   .   C   T   .   .   DP=13;ECNT=2;MBQ=20,30;MFRL=212,211;MMQ=43,33;MPOS=40;POPAF=7.30;TLOD=10.18 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:9,4:0.333:13:6,2:3,1:0|1:14653_C_T:14653:5,4,3,1

Here the vcf validation:

${gatk} ValidateVariants \
-R ${hg38} \
-V "${sample1}.vcf.gz" \
-L ${interval} \
-ip 5

No any warning ...

When I process the "${sample1}.vcf.gz" with:

bcftools annotate -x FORMAT/AF "${sample1}.vcf.gz" -O z -o "${sample1}_noAF.vcf.gz"

and then running GenomicsDBImport I do not have any error ...

Any suggestion please?
Many thanks

Answers

  • manolismanolis Member ✭✭

    Hi, fixed. Seems that was a problem related with one of the hosts of the cluster. Sorry for boring you.

    Best

  • Hi, manolis !
    Tell me, please, how exactly did you solve this problem?
    Many thanks

  • jpfloridojpflorido SevilleMember

    Hi manolis,

    I'm having exactly the same issue with my PoN creation. Supposedly the AF field is correct and all my VCFs (using only 3 for test purposes) passed the ValidateVariants test. I also use the --max-mnp-distance=0 option in Mutect2 to prevent from the known bug in the GenomicsDBImport tool. But still same "Invalid character '.' in 'AF' FORMAT field at ..." and "A fatal error has been detected by the Java Runtime Environment" error happening.

    Would you please mind to let me know what was your host problems and how did you fix it? Just in case the same is happening here...

    Thanks in advance!

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Any suggestion about this ^ @manolis

    Thanks!

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Any suggestion about this ^ @manolis ?

    Thanks!

  • manolismanolis Member ✭✭
    edited July 13

    Our "solution" is totally crazy and we still can not explain why happening this! We have a linux cluster with 6 hosts.
    When I'am going to run GDBI for PON creation (GATK v4.1.1.0) during the day does not work, even if there are no jobs in all hosts!
    When I'm going to run it during the late night it works.

    For now we can not explain this behavior :o:/ We are waiting an answer from our server support.

    Best

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    @jpflorido said:
    Hi manolis,

    I'm having exactly the same issue with my PoN creation. Supposedly the AF field is correct and all my VCFs (using only 3 for test purposes) passed the ValidateVariants test. I also use the --max-mnp-distance=0 option in Mutect2 to prevent from the known bug in the GenomicsDBImport tool. But still same "Invalid character '.' in 'AF' FORMAT field at ..." and "A fatal error has been detected by the Java Runtime Environment" error happening.

    Would you please mind to let me know what was your host problems and how did you fix it? Just in case the same is happening here...

    Thanks in advance!

    Thank you for your answer manolis!

    Is there someone else from the GATK team that can advice with this? I am quite sure my Mutect2 outputs where generated correctly and the AF field seems right to me but maybe I am wrong.

    Just to refresh, we (@jpflorido and me) are trying to create a panel with 3 exome samples but it fails when putting together the VCFs with the GenomicsDBImport tool.

    I have also tried to build the last version of GATK directly from the repository in case this is something that have been fixed recently but same error occurs. I can share whatever you could need.

    Thanks in advance,
    Francisco

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited July 16

    Hi @fmortuno

    As shown in this doc, can you please try to run GenomicsDBImport with --max-mnp-distance 0 as shown in this tutorial https://software.broadinstitute.org/gatk/documentation/article?id=24057 and see if that resolves the error?

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Thank you very much for your answer @bhanuGandham !!!

    However, maybe I am missing something but I cannot see the option max-mnp-distance for GenomicsDBImport in GATK v4.1.2.0 so I get the error:

    max-mnp-distance is not a recognized option

    I checked tutorial for that tool and that version but still cannot see the option. I already used that option in Mutect2 where it is available but got same error at GenomicsDBImport step. Any other suggestion?

    Thanks in advance

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Any thoughts here @bhanuGandham or anyone else? Thanks!!!!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @fmortuno

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited July 24

    Hi @fmortuno

    There was an error in the documentation, --max-mnp-distance 0 should only be used in the mutcet2 command and not in GenomicsDBImport step. You were right about that.

    Its a long thread and I do not see your error log in this thread. Would you please post the exact command you are using and the error you are seeing. Thank you. This will help the dev team debug the issue.

    Sorry for the delay in getting back to you, we have been facing large volumes of questions recently.

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member
    edited July 25

    Thanks @bhanuGandham !!!

    The error is quite the same as the initially posted in this thread. That is why I asked in here. Let me show you the specific logs I got:

    Command (gatk v4.1.2.0):

    gatk GenomicsDBImport \
              --genomicsdb-workspace-path pon_db \
              --R hs37d5.fa \
              -V <sample1>.vcf.gz \
              -V <sample2>.vcf.gz \
              -V <sample3>.vcf.gz \
              -L 0000-scattered.interval_list
    

    Error:

    13:07:27.084 INFO  GenomicsDBImport - Done initializing engine
    13:07:27.378 INFO  GenomicsDBImport - Vid Map JSON file will be written to /mnt/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/tmp/PoN/f42346dd-4a81-424e-984e-73e5b43d4eab/call-CreatePanel/shard-0/execution/pon_db/vidmap.json
    13:07:27.379 INFO  GenomicsDBImport - Callset Map JSON file will be written to /mnt/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/tmp/PoN/f42346dd-4a81-424e-984e-73e5b43d4eab/call-CreatePanel/shard-0/execution/pon_db/callset.json
    13:07:27.379 INFO  GenomicsDBImport - Complete VCF Header will be written to /mnt/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/tmp/PoN/f42346dd-4a81-424e-984e-73e5b43d4eab/call-CreatePanel/shard-0/execution/pon_db/vcfheader.vcf
    13:07:27.379 INFO  GenomicsDBImport - Importing to array - /mnt/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/tmp/PoN/f42346dd-4a81-424e-984e-73e5b43d4eab/call-CreatePanel/shard-0/execution/pon_db/genomicsdb_array
    13:07:27.379 INFO  ProgressMeter - Starting traversal
    13:07:27.379 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
    13:07:27.928 INFO  GenomicsDBImport - Importing batch 1 with 3 samples
    [E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:13079
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002b1be876416b, pid=29197, tid=0x00002b1bb7fa0700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libtiledbgenomicsdb434897115576972739.so+0x35416b]  bcf_unpack+0x36b
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /mnt/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/tmp/PoN/f42346dd-4a81-424e-984e-73e5b43d4eab/call-CreatePanel/shard-0/execution/hs_err_pid29197.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    The interval file was generated with SplitIntervals. As we mentioned before, the three VCFs were generated with Mutect2 using the --max-mnp-distance 0 option and they were validated by ValidateVariants without errors or warnings. I am quite sure the AF field format is correct but If I filter out the AF field from my VCFs the GenomicsDBImport command run without errors. Here is an example of AF header and variant in one VCF:

    ##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
    ...
    1   13116   .   T   G   .   haplotype;map_qual  CONTQ=93;DP=23;ECNT=2;GERMQ=36;MBQ=20,37;MFRL=267,254;MMQ=27,24;MPOS=59;POPAF=7.30;SEQQ=93;STRANDQ=93;TLOD=37.57    GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:13,10:0.440:23:6,6:4,4:0|1:13116_T_G:13116:6,7,5,5
    

    Any suggestion about what could be going on? I can share any other logs or detail you may need.

    Thanks again,
    Francisco

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @fmortuno and @manolis

    Our dev team is looking into this right now. Would you please share your input files with us so we can recreate the error and debug it.
    Please find the details on how to share your data here: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Thank you @bhanuGandham !

    I just uploaded to the FTP my input files, logs and command line as suggested in the article. The name of the compressed file is AF_error_GDBI_for_PoN.tar.gz. I slipped the three VCFs only to the MT chromosome to make easier reproducing the error.

    Please, if possible, confirm you got the shared file correctly in your FTP and let me know when you have more information about the error.

    Thanks again,
    Francisco.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @fmortuno

    We have shared your data with the developers who are trying to recreate the error. We will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 12

    HI @fmortuno

    We are unable to recreate this error on our end using the command and files you provided to us. Those commands worked just fine on our end.
    I am not sure why you are seeing this error. Your logs indicated you were using 4.0.9.0 could you please try to use the latest version GATKv4.1.3.0 and see if the error persists?

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member
    edited August 23

    Hi @bhanuGandham

    Sorry for the late response. I have tried the newest version v4.1.3.0 but error still persists:

    gatk GenomicsDBImport -R hs37d5.fa \
                          --genomicsdb-workspace-path pon_db \
                          -V sample1.MT.nn.vcf.gz \
                          -V sample2.MT.nn.vcf.gz \
                          -V sample3.MT.nn.vcf.gz -L MT
    

    Error Log (for v4.1.3.0):

    Using GATK jar /home/fmortuno/tools/gatk/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/fmortuno/tools/gatk/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar GenomicsDBImport -R /data/lustre/scratch/CBRA/data/indexed_genomes/bwa/hs37d5/hs37d5.fa --genomicsdb-workspace-path pon_db -V sample1.MT.nn.vcf.gz -V sample2.MT.nn.vcf.gz -V sample3.MT.nn.vcf.gz -L MT
    09:28:48.196 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/fmortuno/tools/gatk/gatk-4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Aug 23, 2019 9:28:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:28:49.834 INFO  GenomicsDBImport - ------------------------------------------------------------
    09:28:49.834 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.3.0
    09:28:49.834 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:28:49.835 INFO  GenomicsDBImport - Initializing engine
    09:28:50.101 INFO  IntervalArgumentCollection - Processing 16569 bp from intervals
    09:28:50.134 INFO  GenomicsDBImport - Done initializing engine
    09:28:50.333 INFO  GenomicsDBImport - Vid Map JSON file will be written to /data/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/AF_error_GDBI_for_PoN/pon_db/vidmap.json
    09:28:50.333 INFO  GenomicsDBImport - Callset Map JSON file will be written to /data/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/AF_error_GDBI_for_PoN/pon_db/callset.json
    09:28:50.333 INFO  GenomicsDBImport - Complete VCF Header will be written to /data/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/AF_error_GDBI_for_PoN/pon_db/vcfheader.vcf
    09:28:50.333 INFO  GenomicsDBImport - Importing to array - /data/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/AF_error_GDBI_for_PoN/pon_db/genomicsdb_array
    09:28:50.333 INFO  ProgressMeter - Starting traversal
    09:28:50.333 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
    09:28:50.443 INFO  GenomicsDBImport - Importing batch 1 with 3 samples
    [E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at MT:73
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f565c8b7dfb, pid=25817, tid=0x00007f564fdff700
    #
    # JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
    # Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libtiledbgenomicsdb6724707253584796459.so+0x3cbdfb]  bcf_unpack+0x36b
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /data/lustre/scratch/CBRA/projects/lung_cancer_sas/PoN/AF_error_GDBI_for_PoN/hs_err_pid25817.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    Any other suggestion? If you cannot reproduce the error I understand there is something wrong on my end but I tried running in different machines and I got always the same error. I would need to fix that sooner than later but no idea what can be going on.

    Thanks!

  • manolismanolis Member ✭✭
    edited August 23

    Hi, I still have the same problem (gatk v4.1.1.0) and I believe that also in my case is related to our server/host.

    We do not know why we can run GDBI only during the night and not during the day (I know seems a crazy situation)...

    @fmortuno, you were the only one logged in the server during your tests? Did you try later in the night without other users logged in?

    Thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 27

    Hi @manolis and @fmortuno

    This is a weird situation because we are unable to recreate the error but both of you have reported the same error. In order to figure out what might be common in the way both of you are processing the data would you please answer the following questions:

    1) Are you using a shared file system?
    2) Did you use a docker?
    3) The native error report is usually persisted as a hs_err_pid.log file. Would it be possible to provide us with that file? This is usually found in the directory from where gatk was invoked. But, it is configurable by the system and/or user, so the best way is to grab the filename from standard output. Also, it will be useful if you could set "ulimit -c unlimited" before running gatk.

    Post edited by bhanuGandham on
  • manolismanolis Member ✭✭

    Hi @bhanuGandham

    1) yes
    2) no, I converted the wdl pipelines to bash pipe
    3) I have to check

    Thanks

  • mlatharamlathara USAMember

    Hi @manolis @fmortuno

    I'm a developer working on GenomicsDB...couple more things that might help us in figuring this out

    1) Can you try giving the import more memory. That is, something like:

    gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport <rest of your options>
    

    Sometimes a lack of memory can cause weird errors, so I'm hoping explicitly giving 4g should be enough for the example vcfs you provide. (this, of course, assumes you have more than 4g available)

    2) Can you convert your compressed vcfs to uncompressed and try importing those? You can use bgzip or bcftools (for instance) to uncompress. And (for instance) GATK's IndexFeatureFile tool to index the resulting vcf files. Then import those and let us know if you still see these errors.

    Thanks.

  • manolismanolis Member ✭✭

    Hi @mlathara

    1) Same problem with "Xmx4g -Xms4g" (see hs_err_pid10807.log file)

    2) Still I have the same problem (see hs_err_pid11327.log file)

    Thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @manolis and @fmortuno

    Because we are unable to recreate this issue on our end, we are not quite sure what more we can do. We will keep an eye out to see if there are other users are coming up with a similar issue to maybe find more clues.

    However, if this is big blocker for you then we will try to investigate a little further. Please provide the following information, we will see what we can find:
    1) Your runtime environment details
    2) enable core dump and please provide that to us too.

  • fmortunofmortuno Clinical Bioinformatics Area, FPS, Seville (Spain)Member

    Hi @bhanuGandham, thank you for the support. I totally understand, it seems some weird incompatibility with the system. It didn't work even when I tried in two different environment.

    However, I finally tried using the GATK v4.1.3.0 docker and it worked that way. I think I can go and create my PoN using the docker.

    Thanks again!

  • JoanGibertJoanGibert Member

    Hi!

    I faced the same issue. Found that removing reader_threads solved the problem. Working with GATK 4.1.2.0 without WDL.

    Cheers,

  • manolismanolis Member ✭✭

    Thanks @JoanGibert! I will try (I'm using bash) and I will give you a feedback.

    Best

  • isaienceisaience ParisMember
    @manolis I had the same problem, were you able to solve it?
  • isaienceisaience ParisMember
    For future readers: I also was able to solve the problem by using through docker, I leave the code if it is useful for you:

    `sudo docker run -v `pwd`:`pwd` -w `pwd` -i -t broadinstitute/gatk gatk GenomicsDBImport -R GATK/Reference/GRCh38.d1.vd1.fa -L Cleaned_bqsr.bams/SRR5273612_SRR5273621_realign_target.intervals --genomicsdb-workspace-path PON/pon_db --merge-input-intervals true -V SRR5273610.vcf.gz -V SRR5273611.vcf.gz`
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @fmortuno @JoanGibert @manolis @isaience

    Thank you for the updates. This will be very useful for other community members. GATK team is grateful for your assistance.

  • henahena FinlandMember

    Hi,

    I'm getting the same error as above

    [E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at

    I do have enough memory (max is 8g at the moment though I tried with 35 as well) and I don't have reader_threads option in use. Vcf files were generated with mutect2 using the I also tried the suggestion of uncompressing the vcf files and indexing the uncompressed and using them, but not working. I'm using GATK v4.1.2.0 and testing with two vcf files (the original data set would have ~150).

    As a sidenote. The download page for GATK is offering me v3.8-0 and not v4 which I assume would have a later version than the one I'm using, which might help with the issue.

    Regards,

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @hena

    Please provide the exact command you are using and the entire error log.

  • henahena FinlandMember

    The command is (I'm using the jar directly as the cluster environment has it's own java installation directory)

    /apps/java/jdk1.8.0_77/bin/java -jar /fs/vault/pipelines/common/external/gatk/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar GenomicsDBImport -R /fs/vault/pipelines/vcp/data_files/ensembl/73/Homo_sapiens.GRCh37.73.dna.chr.fa -L all.bed  --genomicsdb-workspace-path test_db -V 00005.vcf -V 00039.vcf 2> err.log
    
Sign In or Register to comment.