The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Error with UnifiedGenotyper after using BaseRecalibrator twice


I am calling SNPs on an organism without a reference genome or database of known polymorphisms, so I'm trying to follow the advice posted here (and in the BaseRecalibrator documentation).

I've successfully called SNPs on the un-recalibrated .bam file, then used those SNPs to recalibrate, then called SNPs on the recalibrated .bam file. As expected, I got significantly fewer (and presumably more accurate) results.

I then used the new, reduced set of SNPs to recalibrate again. When I attempted to call SNPs on this "Round Two" recalibrated .bam file, I got the following error:

SAM/BAM file recalibrated.2.bam is malformed: Program record with group id GATK PrintReads already exists in SAMFileHeader!

I attempted to use PicardTools ValidateSamFile and CleanSam but received the same message (as an IllegalArgumentException). I would definitely consider myself a novice in the field. Any advice you can give will be greatly appreciated.

Best Answers


  • Hi Dr. Van der Auwera,

    Oops, I was running v2.4-7. (I'm not sure how that happened; I've only been doing this for a couple of weeks.) I will upgrade and retry, then post results. Thanks!

  • Update
    Received same error message using v2.4-9 ... I'll try using --no_pg_tag. Thanks for your help!

  • CarneiroCarneiro Charlestown, MAMember

    When you say you attempted to "When I attempted to call SNPs on this "Round Two" recalibrated .bam", do you mean that you saw this error when running BaseRecalibrator, PrintReads or UnifiedGenotyper?

  • Hi Dr. Carneiro,

    I saw the error running UnifiedGenotyper.

  • CarneiroCarneiro Charlestown, MAMember

    can you take a look at your bam file header to see if it has 2 @PG entries for PrintReads?

    you can do so with the following command (provided you have samtools)

    samtools view -H recalibrated2.bam | grep @PG

  • bhall7bhall7 Member
    edited March 2013

    It does have two @PG PrintReads entries. They're identical and each reads as follows:

    @PG ID:GATK PrintReads VN:2.4-7-g5e89f01 CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

    I'm using samtools to remove these lines from the header and I'll try to run UnifiedGenotyper again, then report back.

  • CarneiroCarneiro Charlestown, MAMember

    The fix should be up in the nightly builds, if you want to try it.

  • I just tried the latest nightly, nightly-2013-04-12-g3fc5478, same error with UnifiedGenotyper. Its a merged bam of a family - they were individually recalibrated, then merged and recalibrated. Am I really going to have to spend the time re-headering this? Or is it because recalibration was done with GATK 2.4-7-g5e89f01 ?

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version nightly-2013-04-12-g3fc5478):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions
    ##### ERROR
    ##### ERROR MESSAGE: SAM/BAM file XXX.MERGED.bam is malformed: Program record with group id GATK PrintReads already exists in SAMFileHeader!
    ##### ERROR ------------------------------------------------------------------------------------------
    samtools view -H XXX.MERGED.bam | grep PG | grep PrintReads
    @PG ID:GATK PrintReads  VN:2.4-7-g5e89f01   CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
    @PG ID:GATK PrintReads  VN:2.4-7-g5e89f01   CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
  • Also happens if I just try BaseRecalibrator again. BaseRecalibrator won't take the no_pg_tag as an option, so it dies there.

  • FWIW no_pg_tag didn't help me either; changing the header was the only thing that worked. A bit of a pain, but I only had to do it a few times before I got convergence on my call-snps-recalibrate-call-snps-recalibrate loop.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, we'll take another look at this. Stay tuned, folks.

  • CarneiroCarneiro Charlestown, MAMember

    The problem here is that you guys are trying to run the tools with the bam that has the multiple @PG tags (which was generated with the bug version we fixed).

    For this to work you'll have to regenerate the bam, or manually remove the erroneous duplicated PG tags. In the new version (as far as I can test) running print reads multiple times will not add multiple PG tags anymore -- which fixes the problem, and then you can run any tool UG, BQSR, ... on that bam.

  • ymwymw Member

    I am encountering the same problem.
    After checking some relevant discussions on "call-snps-recalibrate-call-snps-recalibrate loop", I am wodnering if this problem roots from which bam file should be used for each round of recalibration. That is should we (a) use uncalibarted (original) bam file for the 2nd and further rounds of recalibration or (b) use new calibrated bam for the next round of recalibration?
    If we chose the later, one @PG will be added after each around. If we chose the former, there should be only one @PG every time when we run UnifiedGenotyper.

    Am I right? and which option, (a) or (b), is correct logically?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @ymw,

    We haven't compared the two possible options so we can't say definitively, but our default recommendation is indeed to do the successive rounds of recalibration on the original, unrecalibrated file each time. In principle the process should work on the recalibrated file too, but I think you could make the case that the recalibration works best to correct systematic (as opposed to random) errors, and the systematic error patterns are "cleanest" in the original file, while in the successively recalibrated files the patterns may get obscured by the recalibration attempts. So in terms of logic you are correct to say that (a) is the better option. Apologies to anyone who may have misunderstood our recommendations if they were not clear on this point.

    That being said, we have now changed the behavior of the @PG tagging so that if there is already a @PG tag for that program in your header, it will be taken out and replaced by the new one, to remain in compliance with the BAM spec.

  • CarneiroCarneiro Charlestown, MAMember

    Hi ymw.

    Always use the original BAM file on your iterations of recalibration. You always want the priors to be the original quality scores and the adjustments to be calculated on that, not on a biased observation.

    In terms of the @PG tag, either way should only add one @PG tag in the latest version. We fixed this when it got reported.

  • Error with UnifiedGenotyper with option -glm BOTH with GATK during Variant calling


    I am using the below command for calling the raw variants using GATK(GenomeAnalysisTK-2.3-4-g57ea19f) on the realigned recalibrated bam file after BQSR and PrintReads steps but am getting an error. Command am using is

    java -Xmx14g -jar /data/PGP/gmelloni/GenomeAnalysisTK-2.3-4-g57ea19f/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /scratch/GT/vdas/test_exome/exome/hg19.fa -I /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.bam -L /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed -D /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf –glm BOTH -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 200 -l INFO -A AlleleBalance -A DepthOfCoverage -A FisherStrand -log /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.GATKvariants.log -o /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.GATKvariants.raw.vcf


    ERROR MESSAGE: Invalid argument value '???glm' at position 10.
    ERROR Invalid argument value 'BOTH' at position 11.
    I have used this above command earlier while testing my pipeline with a single sample from 1000G project with this version of GATK but did not face any error at that time, but am encountering them with my tumor samples. Any suggestions? I have tried checking the posts and the suggestion I see is version compatibility but I have used this version 5 days back with other sample and the same command worked. Any idea how to get rid of this error? It would be of great help

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @vivekdas_1987,

    This looks like some funky characters were introduced in your command line when you copied it over. Might be an issue with the encoding of whatever file format you store your command lines in. Or if it's some kind of word-processor document (e.g. MS Word) the program may have transformed the basic dash character into a special long-dash character that's not recognized by the shell. Just copy your command to a pure text document, fix the dash, and then you can copy-paste it and it should work.

  • vivekdas_1987vivekdas_1987 MilanMember
    edited December 2013

    Yes this was solved long ago. Thanks for the input. I have had this problem when I first used it

Sign In or Register to comment.