Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Running the five-dollar-genome-analysis-pipeline locally

SystemSystem Administrator admin
This discussion was created from comments split from: New to the forum? Ask your questions here!.

Comments

  • trannguyentrannguyen Member
    Hi, I am trying to run the five-dollar-genome-analysis-pipeline locally. I thought it was pretty straight-forward. However, I got several warnings and an error. I am not sure if it is because my computer does not have enough memory. Can you please help? Thank you so much.

    My JSON:

    {
    "WholeGenomeGermlineSingleSample.sample_and_unmapped_bams": {
    "sample_name": "NA12878 PLUMBING",
    "base_file_name": "NA12878_PLUMBING",
    "flowcell_unmapped_bams": [
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06HDADXX130110.1.ATCACGAT.20k_reads.bam",
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06HDADXX130110.2.ATCACGAT.20k_reads.bam",
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06JUADXX130110.1.ATCACGAT.20k_reads.bam"
    ],
    "final_gvcf_base_name": "NA12878_PLUMBING",
    "unmapped_bam_suffix": ".bam"
    },

    "WholeGenomeGermlineSingleSample.references": {
    "fingerprint_genotypes_file": "/media/D_Drive/5_dollar_pipeline/NA12878_NA12878.hg38.reference.fingerprint.vcf",
    "fingerprint_genotypes_index": "/media/D_Drive/5_dollar_pipeline/NA12878_NA12878.hg38.reference.fingerprint.vcf.idx",
    "contamination_sites_ud": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.UD",
    "contamination_sites_bed": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.bed",
    "contamination_sites_mu": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.mu",
    "calling_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_calling_regions.hg38.interval_list",
    "haplotype_scatter_count": 10,
    "break_bands_at_multiples_of": 100000,
    "reference_fasta" : {
    "ref_dict": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dict",
    "ref_fasta": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.fai",
    "ref_alt": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.alt",
    "ref_sa": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.sa",
    "ref_amb": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.amb",
    "ref_bwt": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.bwt",
    "ref_ann": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.ann",
    "ref_pac": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.pac"
    },
    "known_indels_sites_vcfs": [
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz"
    ],
    "known_indels_sites_indices": [
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
    ],
    "dbsnp_vcf": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf",
    "dbsnp_vcf_index": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.idx",
    "evaluation_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_evaluation_regions.hg38.interval_list"
    },

    "WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_coverage_regions.hg38.interval_list",

    "WholeGenomeGermlineSingleSample.papi_settings": {
    "preemptible_tries": 3,
    "agg_preemptible_tries": 3
    }
    }

    # Some of the warnings:
    [2019-05-15 17:16:10,64] [warn] Local [419a749c]: Key/s [preemptible, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2019-05-15 17:16:10,64] [warn] Local [419a749c]: Key/s [preemptible, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2019-05-15 17:16:10,64] [warn] Local [419a749c]: Key/s [preemptible, disks, cpu, memory] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2019-05-15 17:16:10,64] [warn] Local [419a749c]: Key/s [preemptible, memory, disks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
    [2019-05-15 17:16:10,64] [warn] Local [419a749c]: Key/s [memory, disks, preemptible] is/are not supported by backend. Unsupported attributes will not be part of job executions.



    # return exit code
    exit $rc
    [2019-05-15 17:16:21,14] [info] f1e4ea0f-0afb-4eca-8828-b13d954b47b2-SubWorkflowActor-SubWorkflow-UnmappedBamToAlignedBam:-1:1 [f1e4ea0f]: Starting UnmappedBamToAlignedBam.SamToFastqAndBwaMemAndMba (3 shards)
    [2019-05-15 17:16:21,76] [info] Assigned new job execution tokens to the following groups: 419a749c: 3
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CreateSequenceGroupingTSV:NA:1]: job id: 4151
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:0:1]: job id: 4170
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:2:1]: job id: 4181
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:0:1]: Status change from - to Done
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:2:1]: Status change from - to Done
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:1:1]: job id: 4176
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CreateSequenceGroupingTSV:NA:1]: Status change from - to Done
    [2019-05-15 17:16:22,71] [info] BackgroundConfigAsyncJobExecutionActor [f1e4ea0fUnmappedBamToAlignedBam.CollectQualityYieldMetrics:1:1]: Status change from - to Done
    [2019-05-15 17:16:24,09] [error] WorkflowManagerActor Workflow 419a749c-ea3d-47e5-b990-5798041bf319 failed (during ExecutingWorkflowState): cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1$$anon$1: Call input and runtime attributes evaluation failed for SamToFastqAndBwaMemAndMba:
    Failed to evaluate input 'disk_size' (reason 1 of 1): ValueEvaluator[IdentifierLookup]: No suitable input for 'bwa_ref_size' amongst {input_bam, bwa_version, bwa_commandline, output_bam_basename, reference_fasta, compression_level, disk_multiplier, unmapped_bam_size, preemptible_tries, ref_size}
    Failed to evaluate input 'bwa_ref_size' (reason 1 of 1): [Attempted 1 time(s)] - NoSuchFileException: /media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.bwt
    at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1.applyOrElse(JobPreparationActor.scala:70)
    at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor$$anonfun$1.applyOrElse(JobPreparationActor.scala:66)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
    at akka.actor.FSM.processEvent(FSM.scala:684)
    at akka.actor.FSM.processEvent$(FSM.scala:681)
    at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor.processEvent(JobPreparationActor.scala:42)
    at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:678)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:672)
    at akka.actor.Actor.aroundReceive(Actor.scala:517)
    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
    at cromwell.engine.workflow.lifecycle.execution.job.preparation.JobPreparationActor.aroundReceive(JobPreparationActor.scala:42)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
    at akka.actor.ActorCell.invoke(ActorCell.scala:557)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
  • trannguyentrannguyen Member
    Fixed the above error so never mind. I now have this new error:
    Exception in thread "main" htsjdk.samtools.SAMException: Error in writing fastq file /dev/stdout
    This is at call-SamToFastqAndBwaMemAndMba step.
    Can you guys please help me how to troubleshoot? Thanks in advance.
  • trannguyentrannguyen Member
    This is my input json:
    {
    "WholeGenomeGermlineSingleSample.sample_and_unmapped_bams": {
    "sample_name": "NA12878 PLUMBING",
    "base_file_name": "NA12878_PLUMBING",
    "flowcell_unmapped_bams": [
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06HDADXX130110.1.ATCACGAT.20k_reads.bam",
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06HDADXX130110.2.ATCACGAT.20k_reads.bam",
    "/media/D_Drive/5_dollar_pipeline/NA12878_downsampled_for_testing_unmapped_H06JUADXX130110.1.ATCACGAT.20k_reads.bam"
    ],
    "final_gvcf_base_name": "NA12878_PLUMBING",
    "unmapped_bam_suffix": ".bam"
    },

    "WholeGenomeGermlineSingleSample.references": {
    "fingerprint_genotypes_file": "/media/D_Drive/5_dollar_pipeline/NA12878_NA12878.hg38.reference.fingerprint.vcf",
    "fingerprint_genotypes_index": "/media/D_Drive/5_dollar_pipeline/NA12878_NA12878.hg38.reference.fingerprint.vcf.idx",
    "contamination_sites_ud": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.UD",
    "contamination_sites_bed": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.bed",
    "contamination_sites_mu": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.contam.mu",
    "calling_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_calling_regions.hg38.interval_list",
    "haplotype_scatter_count": 10,
    "break_bands_at_multiples_of": 100000,
    "reference_fasta" : {
    "ref_dict": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dict",
    "ref_fasta": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.fai",
    "ref_alt": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.alt",
    "ref_sa": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.sa",
    "ref_amb": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.amb",
    "ref_bwt": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.bwt",
    "ref_ann": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.ann",
    "ref_pac": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.fasta.64.pac"
    },
    "known_indels_sites_vcfs": [
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz"
    ],
    "known_indels_sites_indices": [
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
    "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
    ],
    "dbsnp_vcf": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf",
    "dbsnp_vcf_index": "/media/D_Drive/5_dollar_pipeline/hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.idx",
    "evaluation_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_evaluation_regions.hg38.interval_list"
    },

    "WholeGenomeGermlineSingleSample.wgs_coverage_interval_list": "/media/D_Drive/5_dollar_pipeline/hg38_v0_wgs_coverage_regions.hg38.interval_list",

    "WholeGenomeGermlineSingleSample.papi_settings": {
    "preemptible_tries": 3,
    "agg_preemptible_tries": 3
    }
    }
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Hi @trannguyen,

    The workflow was designed to work on the google cloud platform and hasn't been tested to run locally, so you may need to make some edits before being able to run it locally.

    The error states that its not able to write to /dev/stdout
    Try editing command in the task so that instead of pipe, the first command generates a file to be used by the next command. Not as elegant as a pipe but this may fix the error.

    .
    .
    .
          java -Xms1000m -Xmx1000m -jar /usr/gitc/picard.jar \
            SamToFastq \
            INPUT=~{input_bam} \
            FASTQ=sample.fastq \
            INTERLEAVE=true \
            NON_PF=true  
    
          /usr/gitc/~{bwa_commandline} sample.fastq - 2> >(tee ~{output_bam_basename}.bwa.stderr.log >&2) | \
          java -Dsamjdk.compression_level=~{compression_level} -Xms1000m -Xmx1000m -jar /usr/gitc/picard.jar \
            MergeBamAlignment \
            VALIDATION_STRINGENCY=SILENT \
            EXPECTED_ORIENTATIONS=FR \
            ATTRIBUTES_TO_RETAIN=X0 \
            ATTRIBUTES_TO_REMOVE=NM \
            ATTRIBUTES_TO_REMOVE=MD \
            ALIGNED_BAM=/dev/stdin \
    .
    .
    .
    
  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Also, like Adelaide mentioned you should confirm your local machine has enough disk and memory to run the workflow on the sample data.
    Since this is a local machine you could run watch df -h and watch free -h to check your machine resources while the workflow is running.

Sign In or Register to comment.