Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

MarkDuplicatesSpark java.io.IOException invalid null input: name

Hi everyone,
I have recently tried to use MarkDuplicateSpark as part of my WES pipeline and replace the old picard's. I am using it on AWS from the docker image. I am new using Spark, so I cannot really decode the issue here, please bear with me :)
Here is the error i find:

 docker \
    run \
    -i \
    --volume=/tmp/tmpyydcjzjh:/syfaVz:rw \
    --volume=/tmp/tmpq1eu_e4l:/tmp:rw \
    --volume=/tmp/tmpob1s4s1k/short_9:/var/lib/cwl/stg2cd7cc29-bfc9-45e3-b7f5-971760620e01/short_9:ro \
    --workdir=/syfaVz \
    --read-only=true \
    --user=500:500 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/syfaVz \
    broadinstitute/gatk \
    gatk \
    MarkDuplicatesSpark \
    -I \
    /var/lib/cwl/stg2cd7cc29-bfc9-45e3-b7f5-971760620e01/short_9 \
    -M \
    metrics.txt \
    -O \
    short_9.dedup.bam \
    -OBI \
    true \
    -VS \
    LENIENT
19:55:10.251 WARN  SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly
19:55:10.405 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
19:55:10.580 INFO  MarkDuplicatesSpark - ------------------------------------------------------------
19:55:10.580 INFO  MarkDuplicatesSpark - The Genome Analysis Toolkit (GATK) v4.1.0.0
19:55:10.580 INFO  MarkDuplicatesSpark - For support and documentation go to https://software.broadinstitute.org/gatk/
19:55:10.580 INFO  MarkDuplicatesSpark - Executing as [email protected] on Linux v4.14.114-83.126.amzn1.x86_64 amd64
19:55:10.581 INFO  MarkDuplicatesSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
19:55:10.581 INFO  MarkDuplicatesSpark - Start Date/Time: May 30, 2019 7:55:10 PM UTC
19:55:10.581 INFO  MarkDuplicatesSpark - ------------------------------------------------------------
19:55:10.581 INFO  MarkDuplicatesSpark - ------------------------------------------------------------
19:55:10.581 INFO  MarkDuplicatesSpark - HTSJDK Version: 2.18.2
19:55:10.582 INFO  MarkDuplicatesSpark - Picard Version: 2.18.25
19:55:10.582 INFO  MarkDuplicatesSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:55:10.582 INFO  MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:55:10.582 INFO  MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:55:10.582 INFO  MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:55:10.582 INFO  MarkDuplicatesSpark - Deflater: IntelDeflater
19:55:10.582 INFO  MarkDuplicatesSpark - Inflater: IntelInflater
19:55:10.582 INFO  MarkDuplicatesSpark - GCS max retries/reopens: 20
19:55:10.582 INFO  MarkDuplicatesSpark - Requester pays: disabled
19:55:10.582 INFO  MarkDuplicatesSpark - Initializing engine
19:55:10.582 INFO  MarkDuplicatesSpark - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/05/30 19:55:10 INFO SparkContext: Running Spark version 2.2.0
19:55:10.908 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19:55:10.961 INFO  MarkDuplicatesSpark - Shutting down engine
[May 30, 2019 7:55:10 PM UTC] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1202192384
java.io.IOException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
    at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
    at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
    at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
    at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.createSparkContext(SparkContextFactory.java:178)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.getSparkContext(SparkContextFactory.java:110)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:28)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:878)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.createSparkContext(SparkContextFactory.java:178)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.getSparkContext(SparkContextFactory.java:110)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:28)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
    at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
    at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:133)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
    at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
    at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.createSparkContext(SparkContextFactory.java:178)
    at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.getSparkContext(SparkContextFactory.java:110)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:28)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

    at javax.security.auth.login.LoginContext.invoke(LoginContext.java:856)
    at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
    at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
    at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
    ... 17 more
Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar MarkDuplicatesSpark -I /var/lib/cwl/stg2cd7cc29-bfc9-45e3-b7f5-971760620e01/short_9 -M metrics.txt -O short_9.dedup.bam -OBI true -VS LENIENT

cpu info:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 1
Model name:            AMD EPYC 7571
Stepping:              2
CPU MHz:               2291.282
BogoMIPS:              4399.77
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             64K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-15

Thanks in advance for your help!!!
Best,
Daiana

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @daianagan

    This error is due to the docker needing root permissions. On google cloud docker grants privileges equivalent to the root user but I am not sure how it works for AWS.
    Try running it without the --user=500:500 argument.

Sign In or Register to comment.