The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

# ReassignOneMappingQualityFilter

NYMember
edited August 2013

Hi
I am using this version of GATK which is
java -jar /apps1/gatk/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar

I have obtained my bam files using bowtie as aligner. So the quality score is 255 in the 5th column. I want to use UnifiedGenotyper and when i use it it gives me an error saying MappingQualityUnavailable.

So I see read filter ReassignOneMappingQualityFilter which can convert 255 to 60 and then it would be easily taken by UnifiedGenotyper to do the snp calling.

So what command line should i be using

This page is not telling how to provide input file and then output file

I have just started to explore GATK.
Hope to hear from you soon

Regards
Varun

Tagged:

Hi Varun,

To use a read filter, you don't call the filter directly, you add it to the command of another tool. In your case, the simplest way to proceed would be to use PrintReads with the read filter, like this:

java -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fa -I input.bam -o output.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60


This will tell the PrintReads tool to make a copy of your input file, and apply the read filter to adjust the MAPQ values. This will create a new file called output.bam with the adjusted MAPQ values.

• NYMember
edited August 2013

Hi Geraldine
So I used your command to generate the new output.bam file.

I used UnifiedGenotyper on this new output.bam and got raw vcf files with around 9460 snps.

I also used this command

java -jar GenomeAnalysisTK.jar -nt 6 -T UnifiedGenotyper -R /GENOMES/sacCer3.fa -I sacCer3.alignment.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -dcov 200 -o snp.vcf

The bam file sacCer3.alignment.bam is basically the one which needs to be converted to score of 60 and we converted it and named output.bam

When I run this command, my vcf output is created BUT when I compare number of snps obtained in this case with previous one where I ran output.bam with UnifiedGenotyper I get around 9448 snps.

Why is there a difference in number of snps??

Regards
Varun

That is also an acceptable way to use the read filter.

The small difference in number of called snps is because you are using multithreading (the nt argument). This causes some marginal calls to be gained or lost because of differences in the downsampling seeds. It is expected.

However what is not so good is that you are skipping indel realignment and base quality score recalibration. You should perform those steps before calling variants, according to our Best Practices.

• NYMember

Since I am still getting hold of this let me describe you the way I understand it and should be done and then probably you can give me your help(with inputs)

1.Since I have my bam file with 255 i should first convert it using ReassignOneMappingQuality read filter.

Command

java -jar GenomeAnalysisTK.jar -T PrintReads -R reference.fa -I input.bam -o output.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 - 

2.After this it should be indel realignment i guess

I don't have the known indel snp.vcf file for cerevisiae so have to skip that..

Lets say after indel realignment we get bam file named as test.bam

3.After this i will have to do BQSR

Can you give me the command line for it . I am looking it though.

4.Then i can just call UnifiedGenotyper.

Let me know if i miss something

Regards

Varun

Hi Varun,

Hopefully they will help you understand what needs to be done and how. Note that you don't need to skip indel realignment; it will work even if you don't have a list of known indels.

• NYMember
edited August 2013

Hi Geraldine
So i started following the indel realignment and base quality score recalibration steps.

I did indel realignment on my bam file but it gave me

INFO 12:14:47,507 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:14:47,510 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.00 INFO 12:15:13,492 ProgressMeter - done 1.22e+07 26.0 s 2.0 s 100.0% 26.0 s 0.0 s INFO 12:15:13,492 ProgressMeter - Total runtime 26.01 secs, 0.43 min, 0.01 hours INFO 12:15:13,492 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 7591116 total reads (0.00%) INFO 12:15:13,493 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 12:15:13,493 MicroScheduler - -> 0 reads (0.00% of total) failing BadMateFilter INFO 12:15:13,493 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 12:15:13,493 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 12:15:13,493 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 12:15:13,494 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 12:15:13,494 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityZeroFilter INFO 12:15:13,494 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 12:15:13,494 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter INFO 12:15:13,494 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter INFO 12:15:14,277 GATKRunReport - Uploaded run statistics report to AWS S3 

So my file realigner.intervals was empty but it ran properly.

Did the next step involving IndelRealigner

INFO 12:20:36,935 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 12:20:36,989 ReadShardBalancer$1 - Loading BAM index data INFO 12:20:36,990 ReadShardBalancer$1 - Done loading BAM index data INFO 12:21:06,948 ProgressMeter - chrIX:294800 1.59e+06 30.0 s 18.0 s 26.2% 114.0 s 84.0 s INFO 12:21:36,960 ProgressMeter - chrVIII:157854 3.32e+06 60.0 s 18.0 s 45.4% 2.2 m 72.0 s INFO 12:22:06,971 ProgressMeter - chrXII:463169 5.05e+06 90.0 s 17.0 s 64.1% 2.3 m 50.0 s INFO 12:22:36,983 ProgressMeter - chrXV:454498 6.76e+06 120.0 s 17.0 s 87.0% 2.3 m 17.0 s INFO 12:23:06,994 ProgressMeter - chrXVI:948058 7.47e+06 2.5 m 20.0 s 100.0% 2.5 m 0.0 s INFO 12:23:13,071 ProgressMeter - done 8.64e+06 2.6 m 18.0 s 100.0% 2.6 m 0.0 s INFO 12:23:13,071 ProgressMeter - Total runtime 156.14 secs, 2.60 min, 0.04 hours INFO 12:23:13,160 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 8637506 total reads (0.00%) INFO 12:23:13,160 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 12:23:13,928 GATKRunReport - Uploaded run statistics report to AWS S3 

So I guess since it ran properly there was no indel realignment.

I ran BQSR step but it requires a vcf file for snp to run. I am working on yeast and it does not have that.

My question is "Is there a dbsnp vcf file for S.cerevisiae??" and if no then is there any other way to run BQSR without it??

Hope to hear from you soon

Regards
Varun

Hi Varun,

That realignment step seems to have run awfully fast, I'd like to check your command line if you don't mind posting it. Is the genome you're working with very small?

For the BQSR step, I don't know if there is a dbsnp file available for S. cerevisiae; we only deal with humans. I recommend you search online to make sure. If there is not, then you can generate a set yourself; have a look at our documentation for details on how to do this.

• NYMember

Hi Geraldine

Command line

java -jar /apps1/gatk/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -nt 4 -T RealignerTargetCreator -R /cork/vgupta12/S.cerevisiae/GENOMES/sacCer3.fa -I output.bam -o realigner.intervals

and then

java -jar /apps1/gatk/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -nt 4 -T IndelRealigner -R /cork/vgupta12/S.cerevisiae/GENOMES/sacCer3.fa -I output.bam -targetIntervals realigner.intervals -o realigned.bam

Yeah yeast genome is very very small 12M in size.

Well that all looks good! Sorry to doubt you, but we don't see those kinds of speeds on our usual human data, so I wanted to make sure.

• NYMember

Hi Geraldine

I started using bwa as my aligner and then want to do GATK for my INDELS and SNP's.

When I map the raw reads with bwa-0.7.5a using bwa mem algorithm, i get bam file with reads which does not have read sequence, instead have a *.
Ofcourse this will create problem downstream, so I want to remove them. But first question , why would bwa report such reads in the bam file.
Secondly I used this command to PRINT READS without * in the seq column but it is giving me error.

It gave me this error

##### ERROR ------------------------------------------------------------------------------------------

can you see what is happening

Regards
Varun

Hi Varun,

I can't answer your first question because bwa is not our tool. If you have any problems with it you should ask the authors of bwa for support.

To your second question, this is a minor bug in GATK that affects how read filters are handled. The MalformedRead filter is applied by default, so when you specify it at the command line, it ends up listed twice in the internal arguments list. This causes the error you see. Just remove the filter from your command line and it should work fine. The filter will be applied automatically.

• NYMember

Hi Geraldine
But I have to remove those reads which have * in the read seq column.
So what should be my command line??
I did this and it gave me an error
java -jar /apps1/gatk/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -T PrintReads -R /cork/GENOMES/sacCer3.fa -I final_sorted.bam -o output.bam

INFO 11:33:54,623 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:33:54,627 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.6-5-gba531bd, Compiled 2013/07/18 18:05:31
INFO 11:33:54,627 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:33:54,635 HelpFormatter - Program Args: -T PrintReads -R /cork/vgupta12/S.cerevisiae/GENOMES/sacCer3.fa -I final_sorted.bam -o output.bam
INFO 11:33:54,635 HelpFormatter - Date/Time: 2014/04/08 11:33:54
INFO 11:33:54,635 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:33:54,635 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:33:55,495 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:33:55,642 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 11:33:55,657 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 11:33:55,685 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
INFO 11:33:55,814 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 11:33:55,823 GenomeAnalysisEngine - Done preparing for traversal
INFO 11:33:55,823 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:33:55,829 ReadShardBalancer$1 - Loading BAM index data INFO 11:33:55,831 ReadShardBalancer$1 - Done loading BAM index data
INFO 11:33:56,924 GATKRunReport - Uploaded run statistics report to AWS S3

##### ERROR ------------------------------------------------------------------------------------------

Regards
Varun

• NYMember

Thanks Geraldine

That worked like a charm

Also now I would want to run Unified genotyper for getting snps and indels(Already got the realign intervals for indels).

I am using this command
java -jar /apps1/gatk/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -T UnifiedGenotyper -R /cork/vgupta12/S.cerevisiae/GENOMES/sacCer3.fa -glm BOTH -I realigned.bam -dcov 200 -ploidy 1 -o snpcall.vcf`

I am using ploidy 1 since my genome which happens to be yeast is haploid. Can you tell me the use of dcov here??
What is the default value and what does it mean.

Thanks for all the help

Regards

Varun