To whom it may concern,
while running the BaseCalibrator, an error occured (see atteched log-file). Could you please give me some help? (The input-file was validated and no error was indicated).
Sorry, it's actually this (forgot to add -c to uniq).
samtools view in.bam | cut -f 5 | sort -n | uniq -c
Is this a small dataset you're working with? Can you tell me what kind of data it is?
no, the dataset is not that small, it's an exome (20 single files covering 4000000 records each). The original data are fastq, so I wrote the files into SAM, merged the SAM files, reordered them ... The input for the BaseRecalibrator was the BAM (realigned and validated).
Hmm, the recalibrator seems to not see your data, so maybe the reads are getting filtered out for some reason. What do the mapping qualities look like?
Need some advise, because I am new with the GATK ... how to check the mapping qualities? I did not do this ;-(
The simplest way is to load your bam file into a genome viewer in IGV, and mouse over reads -- that will pop up a summary of quality statistics. For certain types of mapping qualities that are very bad or unusable I think IGV should display them in a different color so it is obvious when there is a problem.
Just another way, if you are somewhat familiar with unix and have samtools installed you can do;
samtools view in.bam | cut -f 5 | sort -n | uniq
that will give you count of how many reads you have at every mapping quality score for that bam file (it will take a little bit of time, but shouldn't be that prohibitive).
That is a better systematic approach, good point.
I'll try this ... thanx Kurt and Geraldine!