It looks like you're new here. If you want to get involved, click one of these buttons!
Dear all , I am using Queue and DataProcessingPipeline.scala ( from https://github.com/broadgsa/gatk/blob/master/public/scala/qscript/org/broadinstitute/sting/queue/qscripts/DataProcessingPipeline.scala ) to process my bam file . The input is a sample level bam file which has been processed by BWA aligned , Samtools sampe and Picard merge and duplicate removed . The output bam file (~200GB/sample ) is much larger than the input bam file (~80GB/sample ) . I want to know what information was added into the bam file ? Thanks a lot .My Queue version is Queue-2.1-10 . My script :
java \
-Xmx4g \
-Djava.io.tmpdir=../tmp/ \
-jar ./Queue-2.1-10/Queue.jar \
-S ./DataProcessingPipeline.scala \
-i input.bam \
-R /db/human_g1k_v37.fasta \
-D /db//dbSnp_b137.vcf \
-run
Geraldine_VdAuwera
Posts: 2,239 admin
Most of these tools modify some property of the file but when they write it to the new file, they also write the original values for the record. For example, after base recalibration, the new file contains both the new quality string and the original quality string. So the net effect is an increase in the amount of information in the file, and thus an increase in file size.
All this is explained in the documentation. In addition, I would say that the best way for you to know exactly what is changed is to run the processes on a file, then examine a small portion of the files before and after processing.
Geraldine Van der Auwera, PhD
Answers
First, to clarify, this is not related to whether you use Queue or not; it has to do with the GATK tools that Queue is running on your behalf.
As to why your BAM file is larger, I suggest you read the documentation to find out what the tools do. This will tell you what information is added to the file. If you then require additional clarifications we will be happy to provide them for you.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for your kindly help. I check the log files, and find out what tools I have use. I use these tools: RealignerTargetCreator , IndelRealigner , BaseRecalibrator , PrintReads . When IndelRealigner running , will it add some "Realign information" to the bam file ?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •