We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Using SortSamSpark, Spark cluster output file cannot be used

jackyhuangjackyhuang Taiwan,R.O.CMember
Hello team,

When I use the spark cluster output file, the output will be show this sentence.
htsjdk.samtools.util.RuntimeIOException: java.io.IOException: Invalid BAM file header

First command is normal.
gatk SortSamSpark \
--input "hdfs://" \
--output /mnt/jacky/before_hdfs_tmp/ERR194147.sorted.bam \
--tmp-dir .\
-- --spark-runner SPARK \
--spark-master spark:// --executor-memory 120G

The second command will have the above mentioned error.
gatk MarkDuplicatesSpark \
--input hdfs:// \
--output /mnt/jacky/before_hdfs_tmp/ERR194147.sorted.dedup.bam \
--metrics-file /mnt/jacky/before_hdfs_tmp/ERR194147.markDuplicates.metrics \
--read-index hdfs:/// \
-- --spark-runner SPARK \
--spark-master spark:// --executor-memory 120G

When I use picard ValidateSamFile, the output will be show this sentence.
Exception in thread "main" htsjdk.samtools.SAMException: SAMFormatException on record 01

Below is my tool version:
Hadoop: 2.7
Spark: 2.4.4
Java: 1.8.0_222
picard-tools 1.138
ubuntu: 18.04



Sign In or Register to comment.