The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

After BQSR, Why don't create a header ?

HyunminHyunmin Seoul, KoreaMember Posts: 14

When I run the BQSR with GATK 2.8.1, I can't check the PG information on output bam file.

BWA->Picard dedup->GATK LocalRealign->GATK BQSR

Below is the output BAM file after BQSR step.

@PG ID:GATK IndelRealigner VN:2.3-9-gdcdccbb CL:knownAlleles=[(RodBinding name=knownAlleles source=/nG/Reference/hgdownload.cse.ucsc.edu/goldenPath/hg19/KNOWNINDEL/Mills_and_1000G_gold_standard.indels.hg19.vcf)] targetIntervals=/nG/Data/2077/vcf1/node1/chrY/Databind/chrY.bam.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
@PG ID:MarkDuplicates PN:MarkDuplicates VN:1.92(1464) CL:net.sf.picard.sam.MarkDuplicates INPUT=[/nG/Data/2077/step2_makebam/node1-1.bam, /nG/Data/2077/step2_makebam/node1-2.bam, /nG/Data/2077/step2_makebam/node1-3.bam, /nG/Data/2077/step2_makebam/node1-4.bam, /nG/Data/2077/step2_makebam/node1-5.bam, /nG/Data/2077/step2_makebam/node1-6.bam] OUTPUT=/dev/stdout METRICS_FILE=/nG/Data/2077/temp/picard_info.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000000 TMP_DIR=[/nG/Data/2077/temp] QUIET=true VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=0 MAX_RECORDS_IN_RAM=2000000 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO CREATE_INDEX=false CREATE_MD5_FILE=false

Related thread (guess)
http://gatkforums.broadinstitute.org/discussion/2118/baserecalibration-prinread-don-t-create-a-header-and-don-t-obtain-oq-orignal-base-quality-in-bam

Tagged:

Best Answer

Answers

  • KurtKurt Member Posts: 255 ✭✭✭

    Yeah, version 2.3-9 didn't create a @PG tag for the BQSR process. Since BQSR doesn't explicitly create a bam file, you wouldn't get a header from that, but you should get it from PrintReads (but not in version 2.3-9).

    I had a bam file from 2.7-4 lying around and saw the @PG tag for PrintReads that used BQSR, but it isn't verbose. I don't know if newer versions are more verbose or not.

    `@PG ID:GATK IndelRealigner VN:2.7-4-g6f46d11 CL:knownAlleles=[(RodBinding name=knownAlleles source=/usr/local/sandbox/ddl_pipeline_files//Mills_and_1000G_gold_standard.indels.b37.vcf), (RodBinding name=knownAlleles2 source=/usr/local/sandbox/ddl_pipe
    line_files//1000G_phase1.indels.b37.vcf)] targetIntervals=/isilon/ddl/SS_0500181//DDL_021214_A74KD_44872/CFMP1949-1/Reports/Local_Realignment_Intervals/CFMP1949-1.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15
    maxReadsInMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPG
    Tags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null

    @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.103(1598) CL:net.sf.picard.sam.MarkDuplicates INPUT=[/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP1949-1/temp/CFMP1949-1.original.bam] OUTPUT=/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP19
    49-1/temp/CFMP1949-1.dup.bam METRICS_FILE=/isilon/ddl/SS_0500181/DDL_021214_A74KD_44872/CFMP1949-1/Reports/Picard_Duplicates/CFMP1949-1.picard.duplicates.txt VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=M
    arkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIX
    EL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false

    @PG ID:GATK PrintReads VN:2.7-4-g6f46d11 CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
    `

  • HyunminHyunmin Seoul, KoreaMember Posts: 14

    Yes, Both version 2.3.9 and 2.8.1 didn't creat a @PG tag for the BQSR process.
    I think this is the bug.

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Hi Hyunmin, you should have a PG tag in there for PrintReads if you are using the latest version. It definitely does that now (as @Kurt has posted).

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • HyunminHyunmin Seoul, KoreaMember Posts: 14

    Hi Hyunmin, you should have a PG tag in there for PrintReads if you are using the latest version. It definitely does that now (as @Kurt has posted).

    How can I handle the PG tag in PrintReads command?

    My Command..

    java -jar /BiOfs/hmkim87/BioTools/GATK/2.8.1/GenomeAnalysisTK.jar -T PrintReads -R ref.fa -I T1304D2110.dedup.realigned.bam -o T1304D2110.dedup.realigned.recal.bam

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    You don't need to handle it - it should be put there automatically.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • HyunminHyunmin Seoul, KoreaMember Posts: 14

    @Kurt said:
    Looking at your command, it makes it look like you want to write a bam file with recalibrated base call quality scores, but you are not supplying it with table created in the -T BaseRecalibrator step that you should have done previously.

    Your command should've of been;

    java -jar GenomeAnalysisTK.jar -T PrintReads -R ref.fa -I dedup.realigned.bam -BQSR bqsr.table -o dedup.realigned.recal.bam

    with the bqsr.table being created in the -T BaseRecalibrator step

    Thanks!! I forgot it with behave naturally.

Sign In or Register to comment.