After BQSR, Why don't create a header ?

HyunminHyunmin Posts: 11Member

When I run the BQSR with GATK 2.8.1, I can't check the PG information on output bam file.

BWA->Picard dedup->GATK LocalRealign->GATK BQSR

Below is the output BAM file after BQSR step.

@PG ID:GATK IndelRealigner VN:2.3-9-gdcdccbb CL:knownAlleles=[(RodBinding name=knownAlleles source=/nG/Reference/hgdownload.cse.ucsc.edu/goldenPath/hg19/KNOWNINDEL/Mills_and_1000G_gold_standard.indels.hg19.vcf)] targetIntervals=/nG/Data/2077/vcf1/node1/chrY/Databind/chrY.bam.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.92(1464) CL:net.sf.picard.sam.MarkDuplicates INPUT=[/nG/Data/2077/step2_makebam/node1-1.bam, /nG/Data/2077/step2_makebam/node1-2.bam, /nG/Data/2077/step2_makebam/node1-3.bam, /nG/Data/2077/step2_makebam/node1-4.bam, /nG/Data/2077/step2_makebam/node1-5.bam, /nG/Data/2077/step2_makebam/node1-6.bam] OUTPUT=/dev/stdout METRICS_FILE=/nG/Data/2077/temp/picard_info.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000000 TMP_DIR=[/nG/Data/2077/temp] QUIET=true VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=0 MAX_RECORDS_IN_RAM=2000000 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO CREATE_INDEX=false CREATE_MD5_FILE=false

Related thread (guess) http://gatkforums.broadinstitute.org/discussion/2118/baserecalibration-prinread-don-t-create-a-header-and-don-t-obtain-oq-orignal-base-quality-in-bam

Tagged:

Best Answer

Answers

  • KurtKurt Posts: 161Member ✭✭✭

    Yeah, version 2.3-9 didn't create a @PG tag for the BQSR process. Since BQSR doesn't explicitly create a bam file, you wouldn't get a header from that, but you should get it from PrintReads (but not in version 2.3-9).

    I had a bam file from 2.7-4 lying around and saw the @PG tag for PrintReads that used BQSR, but it isn't verbose. I don't know if newer versions are more verbose or not.

    `@PG ID:GATK IndelRealigner VN:2.7-4-g6f46d11 CL:knownAlleles=[(RodBinding name=knownAlleles source=/usr/local/sandbox/ddl_pipeline_files//Mills_and_1000G_gold_standard.indels.b37.vcf), (RodBinding name=knownAlleles2 source=/usr/local/sandbox/ddl_pipe line_files//1000G_phase1.indels.b37.vcf)] targetIntervals=/isilon/ddl/SS_0500181//DDL_021214_A74KD_44872/CFMP1949-1/Reports/Local_Realignment_Intervals/CFMP1949-1.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPG Tags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null

    @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.103(1598) CL:net.sf.picard.sam.MarkDuplicates INPUT=[/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP1949-1/temp/CFMP1949-1.original.bam] OUTPUT=/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP19 49-1/temp/CFMP1949-1.dup.bam METRICS_FILE=/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP1949-1/Reports/Picard_Duplicates/CFMP1949-1.picard.duplicates.txt VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=M arkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIX EL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false

    @PG ID:GATK PrintReads VN:2.7-4-g6f46d11 CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false `

  • HyunminHyunmin Posts: 11Member

    Yes, Both version 2.3.9 and 2.8.1 didn't creat a @PG tag for the BQSR process. I think this is the bug.

  • ebanksebanks Posts: 683GATK Developer mod

    Hi Hyunmin, you should have a PG tag in there for PrintReads if you are using the latest version. It definitely does that now (as @Kurt has posted).

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • HyunminHyunmin Posts: 11Member

    Hi Hyunmin, you should have a PG tag in there for PrintReads if you are using the latest version. It definitely does that now (as @Kurt has posted).

    How can I handle the PG tag in PrintReads command?

    My Command..

    java -jar /BiOfs/hmkim87/BioTools/GATK/2.8.1/GenomeAnalysisTK.jar -T PrintReads -R ref.fa -I T1304D2110.dedup.realigned.bam -o T1304D2110.dedup.realigned.recal.bam

  • ebanksebanks Posts: 683GATK Developer mod

    You don't need to handle it - it should be put there automatically.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • HyunminHyunmin Posts: 11Member

    @Kurt said: Looking at your command, it makes it look like you want to write a bam file with recalibrated base call quality scores, but you are not supplying it with table created in the -T BaseRecalibrator step that you should have done previously.

    Your command should've of been;

    java -jar GenomeAnalysisTK.jar -T PrintReads -R ref.fa -I dedup.realigned.bam -BQSR bqsr.table -o dedup.realigned.recal.bam

    with the bqsr.table being created in the -T BaseRecalibrator step

    Thanks!! I forgot it with behave naturally.

Sign In or Register to comment.