The current GATK version is 3.3-0

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Error with UnifiedGenotyper after using BaseRecalibrator twice

Posts: 8Member

Aloha,

I am calling SNPs on an organism without a reference genome or database of known polymorphisms, so I'm trying to follow the advice posted here (and in the BaseRecalibrator documentation).

I've successfully called SNPs on the un-recalibrated .bam file, then used those SNPs to recalibrate, then called SNPs on the recalibrated .bam file. As expected, I got significantly fewer (and presumably more accurate) results.

I then used the new, reduced set of SNPs to recalibrate again. When I attempted to call SNPs on this "Round Two" recalibrated .bam file, I got the following error:

I attempted to use PicardTools ValidateSamFile and CleanSam but received the same message (as an IllegalArgumentException). I would definitely consider myself a novice in the field. Any advice you can give will be greatly appreciated.

Tagged:

• Posts: 8Member

Hi Dr. Van der Auwera,

Oops, I was running v2.4-7. (I'm not sure how that happened; I've only been doing this for a couple of weeks.) I will upgrade and retry, then post results. Thanks!

• Posts: 8Member

Update
Received same error message using v2.4-9 ... I'll try using --no_pg_tag. Thanks for your help!

When you say you attempted to "When I attempted to call SNPs on this "Round Two" recalibrated .bam", do you mean that you saw this error when running BaseRecalibrator, PrintReads or UnifiedGenotyper?

• Posts: 8Member

Hi Dr. Carneiro,

I saw the error running UnifiedGenotyper.

can you take a look at your bam file header to see if it has 2 @PG entries for PrintReads?

you can do so with the following command (provided you have samtools)

samtools view -H recalibrated2.bam | grep @PG

• Posts: 8Member
edited March 2013

It does have two @PG PrintReads entries. They're identical and each reads as follows:

@PG ID:GATK PrintReads VN:2.4-7-g5e89f01 CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

I'm using samtools to remove these lines from the header and I'll try to run UnifiedGenotyper again, then report back.

Post edited by bhall7 on

The fix should be up in the nightly builds, if you want to try it.

• Posts: 10Member

I just tried the latest nightly, nightly-2013-04-12-g3fc5478, same error with UnifiedGenotyper. Its a merged bam of a family - they were individually recalibrated, then merged and recalibrated. Am I really going to have to spend the time re-headering this? Or is it because recalibration was done with GATK 2.4-7-g5e89f01 ?

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version nightly-2013-04-12-g3fc5478):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR
##### ERROR MESSAGE: SAM/BAM file XXX.MERGED.bam is malformed: Program record with group id GATK PrintReads already exists in SAMFileHeader!
##### ERROR ------------------------------------------------------------------------------------------

samtools view -H XXX.MERGED.bam | grep PG | grep PrintReads
@PG ID:GATK PrintReads  VN:2.4-7-g5e89f01   CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false
@PG ID:GATK PrintReads  VN:2.4-7-g5e89f01   CL:readGroup=null platform=null number=-1 downsample_coverage=1.0 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

• Posts: 10Member

Also happens if I just try BaseRecalibrator again. BaseRecalibrator won't take the no_pg_tag as an option, so it dies there.

• Posts: 8Member

FWIW no_pg_tag didn't help me either; changing the header was the only thing that worked. A bit of a pain, but I only had to do it a few times before I got convergence on my call-snps-recalibrate-call-snps-recalibrate loop.

Hmm, we'll take another look at this. Stay tuned, folks.

Geraldine Van der Auwera, PhD

The problem here is that you guys are trying to run the tools with the bam that has the multiple @PG tags (which was generated with the bug version we fixed).

For this to work you'll have to regenerate the bam, or manually remove the erroneous duplicated PG tags. In the new version (as far as I can test) running print reads multiple times will not add multiple PG tags anymore -- which fixes the problem, and then you can run any tool UG, BQSR, ... on that bam.

• Posts: 9Member

I am encountering the same problem.
After checking some relevant discussions on "call-snps-recalibrate-call-snps-recalibrate loop", I am wodnering if this problem roots from which bam file should be used for each round of recalibration. That is should we (a) use uncalibarted (original) bam file for the 2nd and further rounds of recalibration or (b) use new calibrated bam for the next round of recalibration?
If we chose the later, one @PG will be added after each around. If we chose the former, there should be only one @PG every time when we run UnifiedGenotyper.

Am I right? and which option, (a) or (b), is correct logically?

Hi @ymw,

We haven't compared the two possible options so we can't say definitively, but our default recommendation is indeed to do the successive rounds of recalibration on the original, unrecalibrated file each time. In principle the process should work on the recalibrated file too, but I think you could make the case that the recalibration works best to correct systematic (as opposed to random) errors, and the systematic error patterns are "cleanest" in the original file, while in the successively recalibrated files the patterns may get obscured by the recalibration attempts. So in terms of logic you are correct to say that (a) is the better option. Apologies to anyone who may have misunderstood our recommendations if they were not clear on this point.

That being said, we have now changed the behavior of the @PG tagging so that if there is already a @PG tag for that program in your header, it will be taken out and replaced by the new one, to remain in compliance with the BAM spec.

Geraldine Van der Auwera, PhD

Hi ymw.

Always use the original BAM file on your iterations of recalibration. You always want the priors to be the original quality scores and the adjustments to be calculated on that, not on a biased observation.

In terms of the @PG tag, either way should only add one @PG tag in the latest version. We fixed this when it got reported.

• MilanPosts: 33Member

Error with UnifiedGenotyper with option -glm BOTH with GATK during Variant calling

Hi,

I am using the below command for calling the raw variants using GATK(GenomeAnalysisTK-2.3-4-g57ea19f) on the realigned recalibrated bam file after BQSR and PrintReads steps but am getting an error. Command am using is

java -Xmx14g -jar /data/PGP/gmelloni/GenomeAnalysisTK-2.3-4-g57ea19f/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /scratch/GT/vdas/test_exome/exome/hg19.fa -I /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.bam -L /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed -D /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf –glm BOTH -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 200 -l INFO -A AlleleBalance -A DepthOfCoverage -A FisherStrand -log /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.GATKvariants.log -o /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.GATKvariants.raw.vcf

Error:

ERROR MESSAGE: Invalid argument value '???glm' at position 10.
ERROR Invalid argument value 'BOTH' at position 11.
I have used this above command earlier while testing my pipeline with a single sample from 1000G project with this version of GATK but did not face any error at that time, but am encountering them with my tumor samples. Any suggestions? I have tried checking the posts and the suggestion I see is version compatibility but I have used this version 5 days back with other sample and the same command worked. Any idea how to get rid of this error? It would be of great help