How does GATK4 Accept Sharded Data and SNP&Indels calling Pipeline

joe297joe297 AmericaMember
edited June 2017 in Ask the GATK team

We found that many GATK4 commands accept an option to let them output "sharded" files. But we didn't find how those commands accept "sharded" data that generated from the last step. For example,

gatk ReadsPipelineSpark \
  -I hdfs://ip-/user/tianj/${sample}_sort.bam \
  -R hdfs://ip-/genome/ref/human_g1k_v37.2bit \
  -O hdfs://ip-/user/tianj/${sample}_ReadsPipelineSpark.bam \
  --knownSites hdfs://ip-/genome/ref/Mills_and_1000G_gold_standard.indels.b37.vcf \
  --shardedOutput true \
  -- \
  --sparkRunner SPARK \
  --sparkMaster spark://ip- \
  --num-executors 2 \
  --executor-cores 16 \
  --executor-memory 60g \
  --driver-memory 60g \
  --conf "spark.eventLog.dir=hdfs://ip-/user/tianj/tmp" \
  --conf "spark.local.dir=hdfs://ip-/user/tianj/tmp"

HaplotypeCallerSpark \
  -I hdfs://ip-/user/tianj/${sample}_ReadsPipelineSpark.bam \
  -O hdfs://ip-/user/tianj/$sample.vcf \
  -R hdfs://ip-/genome/ref/human_g1k_v37.2bit \
  -- \
  --sparkRunner SPARK \
  --sparkMaster spark://ip-172-31-2-45:7077 \
  --num-executors 2 \
  --executor-cores 16 \
  --executor-memory 60g \
  --driver-memory 60g \
  --conf "spark.eventLog.dir=hdfs://ip-/user/tianj/tmp" \
  --conf "spark.local.dir=hdfs://ip-/user/tianj/tmp"

This pipeline makes ${sample}_ReadsPipelineSpark.bam a directory and it doesn't work. I didn't find any option to specify that the input file is "sharded" or not.How should we use the "shardedoutput" option?

And also, we noticed that for SV calling, there is a whole pipeline command, but for SNP&Indels calling, we only found partial pipeline such as ReadPipelineSpark. Is there a whole SNP&Indels calling pipeline script we can use? Though it still seems to cost some unnecessary time keeps writing and reading files.

Tagged:

Issue · Github
by Sheila

Issue Number
2194
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

Sign In or Register to comment.