Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Problem with CrosscheckFingerprints when using NIO

ruslanafrazerruslanafrazer Member, Broadie
edited May 2018 in Ask the GATK team

I'm trying to run the CrosscheckFingerprints tool in the gatk-4.0.4.0 package without copying my BAM files into the VM, but trying to access the file directly on gcs.
The task starts running and I even get this initial output (on stderr):

/usr/local/jre1.8.0_73/bin/java -Xmx14g -jar /gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar CrosscheckFingerprints \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam \
-I gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam \
-H /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt \
--QUIET false --EXIT_CODE_WHEN_MISMATCH 0 \
--OUTPUT crosscheck.stats.txt \
--VALIDATION_STRINGENCY LENIENT
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/fc-4aacc2d6-4017-4fc2-b95a-fb892a3562b9/70dc8591-32c2-4de1-b2f8-2af887f2a8e3/Clinical_Workflow/6794a873-5bbb-46ef-ae2c-a10e77dab94e/call-CrossCheckLaneFingerprints_Task/attempt-2/tmp.197aeb50
17:12:16.387 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu May 17 17:12:16 UTC 2018] CrosscheckFingerprints  --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-TTP1-A-1-1-D-A553-36/REBC-AC8T-TTP1-A-1-1-D-A553-36.bam --INPUT gs://fc-66b68450-9d98-490f-934f-a9d824aac4be/REBC-AC8T-NB1-A-1-0-D-A525-36/REBC-AC8T-NB1-A-1-0-D-A525-36.bam --OUTPUT crosscheck.stats.txt --HAPLOTYPE_MAP /cromwell_root/firecloud-tcga-open-access/tutorial/reference/Homo_sapiens_assembly19.haplotype_database.txt --EXIT_CODE_WHEN_MISMATCH 0 --QUIET false --VALIDATION_STRINGENCY LENIENT  --CROSSCHECK_MODE CHECK_SAME_SAMPLE --LOD_THRESHOLD 0.0 --CROSSCHECK_BY READGROUP --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXPECT_ALL_GROUPS_TO_MATCH false --VERBOSITY INFO --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Thu May 17 17:12:16 UTC 2018] Executing as [email protected] on Linux 4.9.0-0.bpo.6-amd64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.0.4.0
INFO    2018-05-17 17:12:18 CrosscheckFingerprints  Fingerprinting 2 INPUT files.

But then the task just gets stuck without printing out anything else.
Yesterday I ran the task for 20 hours and had to abort it because it never finished. If I copy the file into the VM the whole process takes a couple of hours, including the time it takes to copy the files.

What could be the reason for this?

Thanks!
Ruslana

Post edited by shlee on

Issue · Github
by Sheila

Issue Number
3096
State
closed
Last Updated
Assignee
Array
Closed By
sooheelee

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ruslanafrazer
    Hi Ruslana,

    I have a feeling this does not yet work, but I need to check with the team and get back to you.

    -Sheila

  • ruslanafrazerruslanafrazer Member, Broadie

    Thank you, Sheila!
    While we're on this matter, do any of the Picard tools that were incorporated into GATK4 work with NIO yet?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @ruslanafrazer, I've let the developers know and you can track the progress of the issue at https://github.com/broadinstitute/picard/issues/1175. It may be that this tool isn't meant to use NIO. In this case, I agree it would be useful to have an immediate error message that informs you of such.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    The GATK4.0 launch page (https://software.broadinstitute.org/gatk/gatk4) states the following:

    "Google Cloud engineers gave GATK4 the ability to stream data directly from Google Cloud Storage (GCS) through the NIO protocol, enabling considerable savings of time and money in cloud executions."

    Which tools in GATK4.0 support data streaming? Repeating Ruslana's question: do any of the Picard tools incorporated into GATK4.0 work with NIO?

    Thanks!

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited May 2018

    @ruslanafrazer, the developer confirms in the issue ticket referenced above that this tool can use NIO. There is a separate Picard cloudjar that production uses towards this. Whether this Picard tool called by GATK with two cloud BAMs as input supports NIO is a separate question @birger.

    Post edited by shlee on
  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    the master version in Picard should now have a fix for this issue. Sorry for it taking so long to resolve.

    The problem was that the index as being read with no buffering...but that might be too much detail.

    The next picard release will have this working, and the next time we rev picard in GATK (after that) it will be working in GATK.

Sign In or Register to comment.