Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Java and GenotypeGVCFs errors

Dear GATK team members,

I have a few (but very long...) questions and I'd be really grateful if you answered my questions. I used GATK for the first time three days ago and there are some things that are not working well in the process, so I am writing to solve it. I'm really sorry if these are already solved problems.

My goal is to get linkage data from the genotype of recombinants. I am using C. elegans data as a practice, and this nematode has many recombinants made from two divergent strains, N2 and CB4856. Recombinants are mixed with N2 or CB4856 at each genomic position, so I can (possibly) see how much each positions are linked. The data I have is the sequencing of 40 recombinants to ~ 6x depth.

To analyze this, I analyzed my data using the information in the Germlines SNPs + Indels section of Best Practices (and the 2015 pdf document with script examples) and the information in the Tool Documentation Index. In the process, I got some errors and questions.

I'm using Ubuntu 18.04 and GATK-4.0.5.1.

  1. Java error
    I solved this problem partly, but have still some problems. I followed the next link,
    https://software.broadinstitute.org/gatk/documentation/article?id=11135
    In this article, they suggest the following script,
    /usr/libexec/java_home -v 1.7.0_79 --exec java -jar GenomeAnalysisTK.jar -T ...
    , so I mimicked it.
    ../jre1.8.0_171/ -v 1.7.0_79 --exec java -jar gatk-package-4.0.5.1-local.jar
    However, it showed an error.
    -bash: ../jre1.8.0_171/: Is a directory
    I used the following script to run gatk, and it works fine.
    ../jre1.8.0_171/bin/java -jar gatk-package-4.0.5.1-local.jar

However, it doesn't suitable for the spark tool, not the local one.

../jre1.8.0_171/bin/java -jar gatk-package-4.0.5.1-spark.jar

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Partitioner at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructors(Class.java:1651) at org.broadinstitute.hellbender.utils.ClassUtils.canMakeInstances(ClassUtils.java:30) at org.broadinstitute.hellbender.Main.extractCommandLineProgram(Main.java:318) at org.broadinstitute.hellbender.Main.setupConfigAndExtractProgram(Main.java:180) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202) at org.broadinstitute.hellbender.Main.main(Main.java:289) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partitioner at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 8 more

../jdk1.8.0_171/bin/java -jar gatk-package-4.0.5.1-spark.jar

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Partitioner at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getConstructors(Class.java:1651) at org.broadinstitute.hellbender.utils.ClassUtils.canMakeInstances(ClassUtils.java:30) at org.broadinstitute.hellbender.Main.extractCommandLineProgram(Main.java:318) at org.broadinstitute.hellbender.Main.setupConfigAndExtractProgram(Main.java:180) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:202) at org.broadinstitute.hellbender.Main.main(Main.java:289) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Partitioner at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 8 more

Could you suggest any helps? If it is not a GATK problem, please just let me know. I found some articles that showed some Java errors are come from my server, rather than the GATK.

  1. GenotypeGVCFs error
    I tried to follow the Best Practices document for Germlines SNPs + Indels, but I got a trouble when I used the GenotypeGVCFs tool.

What I did last three days:
(1) extract reads and align them to another genome version. (44 samples = N2, CB4856 and 42 recombinants)
../picard SamToFastq I=$i.bam F=fastq/$i.1.fastq F2=fastq/$i.2.fastq ($i = 44 strain names sequentially)
bwa mem -t 40 reference.fasta $i.1.fastq $i.2.fastq | samtools sort [email protected] 40 -O BAM -o $i.sort.bam
(2) remove duplicates
/media/elegans/main/tools/picard MarkDuplicatesWithMateCigar \ I=$i.sort.bam O=$i.sort.dup.bam M=$i.dup.txt REMOVE_DUPLICATES=true
(3) add read groups and make index files
/media/elegans/main/tools/picard AddOrReplaceReadGroups \ I=$i.sort.dup.bam O=$i.sort.dup.rg.bam RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=$i
samtools index $i.sort.dup.rg.bam
(4) variant calling with HaplotypeCaller
../jre1.8.0_171/bin/java -Xmx32G -jar \ ../gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar \ HaplotypeCaller --reference ..reference.fasta \ --input $i.sort.dup.rg.bam -ERC GVCF --output $i.g.vcf --use-new-qual-calculator
(5) import database using GenomicsDBImport
../jre1.8.0_171/bin/java -Xmx32G -jar \ ../gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar \ GenomicsDBImport -V N2.g.vcf -V CB4856.g.vcf -V strain1.g.vcf ... -V strain42 \ --genomicsdb-workspace-path recombinant_DB/$j \ --intervals $j ($j = chromosome names, I, II, III, IV, V, MtDNA, X)
(6) joint genotyping using GenotypeGVCFs
../jre1.8.0_171/bin/java -Xmx32G -jar ../gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar \ GenotypeGVCFs -R ../reference.fasta -V gendb://recombinant_DB/I -G StandardAnnotation -O joint.genotyping.chrI.vcf --founder-id N2 --use-new-qual-calculator

Then I got the following errors (I picked only WARN signs).
10:36:54.710 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
and
10:36:54.838 INFO GenotypeGVCFs - Initializing engine WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records 10:36:55.750 INFO GenotypeGVCFs - Done initializing engine 10:36:55.786 INFO ProgressMeter - Starting traversal 10:36:55.786 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records 10:37:00.587 INFO GenotypeGVCFs - Shutting down engine GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),0.573704676999999,Cpu time(s),0.5681586370000016

Then it stopped. When I opened the joint.genotyping.chrI.vcf file, it contains only 394 lines (from 3040 - 55642 position) and it hasn't been changed. Chromosome I has 15 Mb size.

I don't know whether that's because of big size, I tried to reduce its size.
../jre1.8.0_171/bin/java -Xmx32G -jar ../gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar \ GenomicsDBImport -V N2.g.vcf -V CB4856.g.vcf -V strain1.g.vcf ... -V strain42 \ --genomicsdb-workspace-path recombinant_DB/VR1.4Mb --intervals V:20000000-21389866 (1.4 Mb region)
../jre1.8.0_171/bin/java -Xmx32G -jar ../gatk-4.0.5.1/gatk-package-4.0.5.1-local.jar \ GenotypeGVCFs -R reference.fasta -V gendb://recombinant_DB/VR1.4Mb \ -G StandardAnnotation -O joint.genotyping.chrVR.vcf --founder-id N2 --use-new-qual-calculator

It was done, but it showed similar errors and additional ones.
java: tpp.c:84: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed. Aborted (core dumped)

Could you suggest any helps? I really appreciate your efforts.

Small question: what is the founder samples? I'm wondering if both N2 and CB4856 are founder samples or not.
Does --founder-id N2 should be replaced by --founder-id 'N2 CB4856'?

If I have mistakes during variant calling, please let me know.

Thank you for maintaining this wonderful software and this community!

Answers

  • JunKimJunKim Member

    One more question: does a docker image for GATK4 solve all these problems? If so, I will try to use it. Please let me know :)
    https://gatkforums.broadinstitute.org/gatk/discussion/10870/howto-run-gatk4-in-a-docker-container#latest

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @JunKim
    Hi,

    It is possible the docker image may solve these issues. Can you please try that and let me know if that eliminates some (hopefully all) of the issues in your original thread?

    Thanks,
    Sheila

  • jrandalljrandall Member

    We are also experiencing a problem wherein GATK 4.0.5.1 GenotypeGVCFs processes hang for many hours.

    The last thing our processes logged was the same as reported here:

    08:48:23.075 INFO  GenotypeGVCFs - Shutting down engine
    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),0.044110324,Cpu time(s),0.026897973000000006
    

    This particular job ran for about 3m before outputting this line, then stayed running (but apparently doing nothing) for 8 hours before we killed it.

    It is one of 1996 jobs that all did pretty much exactly the same thing in a similar time frame - in all cases these were the last two lines logged but GATK failed to terminate afterwards. At the same time, we did have about 8k jobs finish successfully and exit 0, so it appears that the rate at which this happens is (at least for our workload) is around 20%. Don't know yet whether or not this behaviour is deterministic. More on that later.

    Issue · Github
    by Sheila

    Issue Number
    4973
    State
    open
    Last Updated
  • jrandalljrandall Member

    This issue appears to be deterministic. I note that GenotypeGVCFs appears to have written a complete output file (including tabix index) before hanging.

    With --verbosity DEBUG, the last few lines look something like:

    23:09:54.476 INFO  GenotypeGVCFs - Shutting down engine
    23:09:54.477 DEBUG FeatureDataSource - Cache statistics for FeatureInput drivingVariantFile:gendb:///tmp/hgi-mercury-15x/out/15x-exome-joint-calling-2018jun11-eglyx.joint_calling_inputs.diploid.local_keep.gvcfs_transposed.json_chr2:169280300-169280569.gdb:
    23:09:54.478 DEBUG FeatureCache - Cache hit rate was 0.00% (0 out of 0 total queries)
    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),11.363570887,Cpu time(s),11.340037935999998
    
  • jrandalljrandall Member

    4.0.5.0 has the same problem, but it looks like 4.0.4.0 does not. Here are the last few lines of --verbosity DEBUG output from 4.0.4.0 on the same data as above:

    23:22:34.039 INFO  GenotypeGVCFs - Shutting down engine
    23:22:34.041 DEBUG FeatureDataSource - Cache statistics for FeatureInput drivingVariantFile:gendb:///tmp/hgi-mercury-15x/out/15x-exome-joint-calling-2018jun11-eglyx.joint_calling_inputs.diploid.local_keep.gvcfs_transposed.json_chr2:169280300-169280569.gdb:
    23:22:34.041 DEBUG FeatureCache - Cache hit rate was 0.00% (0 out of 0 total queries)
    23:22:34.042 DEBUG FeatureDataSource - Cache statistics for FeatureInput drivingVariantFile:gendb:///tmp/hgi-mercury-15x/out/15x-exome-joint-calling-2018jun11-eglyx.joint_calling_inputs.diploid.local_keep.gvcfs_transposed.json_chr2:169280300-169280569.gdb:
    23:22:34.042 DEBUG FeatureCache - Cache hit rate was 0.00% (0 out of 0 total queries)
    [June 27, 2018 11:22:34 PM UTC] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 1.11 minutes.
    Runtime.totalMemory()=5330436096
    

    So it looks like where 4.0.4.0 goes through another set of FeatureDataSource / FeatureCache calls and then finishes, 4.0.5.0 and 4.0.5.1 instead get stuck in the GENOMICSDB_TIMER

  • jrandalljrandall Member

    The GenotypeGVCFs hanging problems with 4.0.5.1 and 4.0.5.0 occurs using both Oracle Java (1.8.0_74-b02) and OpenJDK (v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) and it happens when run outside docker, when run in the official broadinstitute/gatk docker image, and when run in one of our docker images built from source. It happens when using our own wrapper script and when using the official gatk python script wrapper.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @jrandall
    Hi,

    Thanks for all the information. I will pass this along to the team.

    -Sheila

  • JunKimJunKim Member

    @jrandall

    Hi,
    You're so kind and awesome! Thank you so much for your wonderful comments. I should change my GATK version and try mine once again.

    Best,
    Jun.

  • jvcjvc Member
    edited July 23

    @jrandall I was having a similar issue. It is critical that you specify a --TMP-DIR using both GenomicsDBImport, and GenotypeGVCFs. Reading and writing from GenomicsDB import seems to make greedy use of /tmp unless you specify a different one, as /tmp fills up, read/write massively slows to a stop, then fails with the error below for me. If I specify a different /tmp using --TMP-DIR for both GenomicsDBImport and GenotypeGVCFs, I do not have this problem. I am scattering across 2400 intervals, and this problem was stopping me once about 300 had finished, because /tmp had filled.

    Though your symptoms were different, your issue could be related.

    Exception in thread "main" java.lang.ExceptionInInitializerError
        at com.intel.genomicsdb.GenomicsDBFeatureReader.initialize(GenomicsDBFeatureReader.java:227)
        at com.intel.genomicsdb.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:179)
        at com.intel.genomicsdb.GenomicsDBFeatureReader.<init>(GenomicsDBFeatureReader.java:130)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getGenomicsDBFeatureReader(FeatureDataSource.java:383)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:288)
        at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:244)
        at org.broadinstitute.hellbender.engine.VariantWalker.initializeDrivingVariants(VariantWalker.java:55)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:47)
        at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:558)
        at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:43)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
        at org.broadinstitute.hellbender.Main.main(Main.java:277)
    Caused by: com.intel.genomicsdb.GenomicsDBException: Could not load genomicsdb native library
        at com.intel.genomicsdb.GenomicsDBQueryStream.<clinit>(GenomicsDBQueryStream.java:47)
        ... 16 more
    Caused by: java.lang.UnsatisfiedLinkError: no tiledbgenomicsdb in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at com.intel.genomicsdb.GenomicsDBUtils.loadLibrary(GenomicsDBUtils.java:48)
        at com.intel.genomicsdb.GenomicsDBQueryStream.<clinit>(GenomicsDBQueryStream.java:41)
        ... 16 more
    13:53:06.608 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/apps/CentOS7/install/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
    13:53:06.619 WARN  NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No space left on device)
    13:53:06.621 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/n/apps/CentOS7/install/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar!/com/intel/gkl/native/libgkl_compression.so
    **13:53:06.621 WARN  NativeLibraryLoader - Unable to load libgkl_compression.so from native/libgkl_compression.so (No space left on device)**
    
Sign In or Register to comment.