realigned.bam file vs realigned.recal.bam file

Hi, when I do the base recalibrator step the size of my realigned.recal.bam file is 3 times bigger than my realigned.bam file.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Monica,

    Depending on the version you are using, you may be keeping the original quals in the recalibrated file. This will typically make the file bigger (although 3x seems like a lot). You can disable this behavior in older versions; in newer versions the original quals are discarded by default unless you specify otherwise in your command line. Also, it is possible that your original file was not maximally compressed.

  • monicafabbromonicafabbro ArgentinaMember

    Hi Geraldine.
    I'm using the lastest version of GATK: 2.7-2. Before I do the Recalibration step, the size of the aligned.bam file is 9 G and after that it's 25 G.
    Here is my command line:


    java -Xmx${MEMORY_LIMIT} -jar ${GATK}/GenomeAnalysisTK.jar -T IndelRealigner -R ${GENOME_REF} -known ${VAR_REF1} -known ${VAR_REF2} -I ${SAMPLE}.lane_${LANE}.aligned.sorted.dedupped.RG.bam -targetIntervals ${SAMPLE}.lane_${LANE}.aligned.sorted.dedupped.RG.intervals -o ${SAMPLE}.lane_${LANE}.realigned.bam > ${SAMPLE}.lane_${LANE}.realigned.log 2>&1


    java -Xmx${MEMORY_LIMIT} -jar ${GATK}/GenomeAnalysisTK.jar -T BaseRecalibrator -I ${SAMPLE}.lane_${LANE}.realigned.bam -R ${GENOME_REF} -knownSites ${VAR_REF1} -knownSites ${VAR_REF2} -knownSites ${VAR_REF3} -o ${SAMPLE}.lane_${LANE}.recal.before.table > ${SAMPLE}.lane_${LANE}.recal.before.table.log 2>&1

    PrintReads -BQSR

    java -Xmx${MEMORY_LIMIT} -jar ${GATK}/GenomeAnalysisTK.jar -nct ${NUM_THREADS} -T PrintReads -R ${GENOME_REF} -I ${SAMPLE}.lane_${LANE}.realigned.bam -BQSR ${SAMPLE}.lane_${LANE}.recal.before.table -o ${SAMPLE}.lane_${LANE}.realigned.recal.bam > ${SAMPLE}.lane_${LANE}.realigned.recal.log 2>&1

    Hoping you could help me

