Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
I'm trying to test GenomicDBImport with a small set of gVCF samples (gatk-18.104.22.168). Here is the command I ran:
gatk GenomicsDBImport \
-V 1.gvcf.gz \
-V 2.gvcf.gz \
-V 3.gvcf.gz \
-V 4.gvcf.gz \
-V 5.gvcf.gz \
-V 6.gvcf.gz \
--genomicsdb-workspace-path test_database \
And I got a bunch of error messages and eventually tracked down to these lines:
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00002aab94159809, pid=3156, tid=0x00002aab5bae2700
JRE version: OpenJDK Runtime Environment (8.0_181-b13) (build 1.8.0_181-b13)
Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops)
C [libtiledbgenomicsdb6069813449664720959.so+0x159809] BufferVariantCell::set_cell(void const*)+0x99
Core dump written. Default location: /my_directory/core or core.3156
This looks similar to the issue reported here:
GenomicsDBImport: A fatal error has been detected by the Java Runtime Environment #5045
Following discussions in that thread, I ran "vcf_validator" for my gVCF files and got this (for 1.gvcf.gz):
Error: ALT metadata ID does not begin with DEL/INS/DUP/INV/CNV. This occurs 1 time(s), first time in line 2.
Error: Format is not a colon-separated list of alphanumeric strings. This occurs 93311797 time(s), first time in line 631.
Error: Alternate ID is not prefixed by DEL/INS/DUP/INV/CNV and suffixed by ':' and a text sequence. This occurs 20022413 time(s), first time in line 632.
At this point, I'm not sure whether these gVCF errors are causing the crash, and some of my gVCF files were not results of GATK HaplotypCaller, so that might be a factor, too.
If anyone could offer some advices on this problem, it would be great!