It looks like you're new here. If you want to get involved, click one of these buttons!
Hi. I am getting VERY odd results with some Streptococcus equi sequence. The BAM files from BWA align well in IGV, but when I run them through your pipeline there are many local errors where it seems that a single indel has been incorrectly multiplied up - somehow. You need to see the IGV screenshot.!
The bottom is a BAM file from BWA and the top is the final one from the GATK pipeline.


Answers
Hi Mike,
Can you tell me what version you are using and what is your command-line?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The Genome Analysis Toolkit (GATK) v2.2-8-gec077cd, Compiled 2012/11/14 16:10:59
Command line as in attached log file.
Thanks for helping on this....
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hey Mike, I'm not seeing an attached file. It might have been a format that the server rejects. If you can just paste your command line that would be fine.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine. Sorry about that. Here it is pasted.
COMMAND LOG for NGS pipeline version 18
Analysis name: M2c
PREFERENCES:
Reference sequence: /home/genetics/strep_equi/strep_equi.fasta Name of this analysis: M2c E-mail address: mike.boursnell@aht.org.uk Data type: PE
Memory setting: -Xmx4g
UnifiedGenotyper constants:
Make parallel VCF: parallel
DATA FILES :
Number of samples to analyse: 1
Bam file 1 M2_Appleby_f2b.bam
-- File: 1/1 Step: 1/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/AddOrReplaceReadGroups.jar I=M2_Appleby_f2b.bam O=M2c_M2b_Appleby_rg.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true RGID=M2b_Appleby RGLB=s_equi RGPL=illumina RGPU=M2b_Appleby RGSM=M2b_Appleby
Output file: M2c_M2b_Appleby_rg.bam Size: 151098168 Time for stage: 0 days, 0 hours, 0 minutes and 40 seconds
-- File: 1/1 Step: 2/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/MarkDuplicates.jar I=M2c_M2b_Appleby_rg.bam O=M2c_M2b_Appleby.dedup.bam VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true M=M2c_M2b_Appleby.metrics
Output file: M2c_M2b_Appleby.dedup.bam Size: 151179932 Time for stage: 0 days, 0 hours, 0 minutes and 43 seconds
-- File: 1/1 Step: 3/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -o M2c_M2b_Appleby.dedup.bam.intervals -mismatch 0.0 -S LENIENT
Output file: M2c_M2b_Appleby.dedup.bam.intervals Size: 48172 Time for stage: 0 days, 0 hours, 0 minutes and 45 seconds
-- File: 1/1 Step: 4/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T IndelRealigner -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -targetIntervals M2c_M2b_Appleby.dedup.bam.intervals -o M2c_M2b_Appleby.clean.dedup.bam -compress 0 -model USE_READS -S LENIENT
Output file: M2c_M2b_Appleby.clean.dedup.bam Size: 506780456 Time for stage: 0 days, 0 hours, 0 minutes and 48 seconds
-- File: 1/1 Step: 5/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.clean.dedup.bam -knownSites /home/genetics/strep_equi/strep_equi_dummy_DBSNP.vcf -o M2c_M2b_Appleby.recal.grp
Output file: M2c_M2b_Appleby.recal.grp Size: 1225814 Time for stage: 0 days, 0 hours, 12 minutes and 28 seconds
-- File: 1/1 Step: 6/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T PrintReads -I M2c_M2b_Appleby.clean.dedup.bam -R /home/genetics/strep_equi/strep_equi.fasta -BQSR M2c_M2b_Appleby.recal.grp -o M2c_M2b_Appleby.clean.dedup.recal.bam -S LENIENT
Output file: M2c_M2b_Appleby.clean.dedup.recal.bam Size: 244997963 Time for stage: 0 days, 0 hours, 3 minutes and 22 seconds
-- File: 1/1 Step: 7/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/ValidateSamFile.jar INPUT=M2c_M2b_Appleby.clean.dedup.recal.bam OUTPUT=M2c_M2b_Appleby_validate.out VALIDATION_STRINGENCY=SILENT MODE=SUMMARY MAX_OUTPUT=100 MAX_OPEN_TEMP_FILES=7900
Output file: M2c_M2b_Appleby_validate.out Size: 16 Time for stage: 0 days, 0 hours, 0 minutes and 13 seconds
-- File: 1/1 Step: 8/11
/opt/samtools/samtools flagstat M2c_M2b_Appleby.clean.dedup.recal.bam > M2c_M2b_Appleby_flagstat.out
Output file: M2c_M2b_Appleby_flagstat.out Size: 391 Time for stage: 0 days, 0 hours, 0 minutes and 4 seconds
-- File: 1/1 Step: GC_BIAS/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectGcBiasMetrics.jar REFERENCE_SEQUENCE=/home/genetics/strep_equi/strep_equi.fasta INPUT=M2c_M2b_Appleby_final.bam O=out1.junk CHART_OUTPUT=M2b_Appleby_gc_bias.pdf VALIDATION_STRINGENCY=LENIENT
Output file: M2b_Appleby_gc_bias.pdf Size: 19783 Time for stage: 0 days, 0 hours, 0 minutes and 11 seconds
-- File: 1/1 Step: SIZE_METRICS/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectInsertSizeMetrics.jar INPUT=M2c_M2b_Appleby_final.bam O=out2.junk HISTOGRAM_FILE=M2b_Appleby_insert_size.pdf VALIDATION_STRINGENCY=LENIENT
Output file: M2b_Appleby_insert_size.pdf Size: 32109
-- File: 1/1 Step: PARALLEL SNPS VCF/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm SNP -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 -o SNPS_M2c.vcf -S LENIENT
Output file: SNPS_M2c.vcf Size: 6984668 Time for stage: 0 days, 0 hours, 3 minutes and 10 seconds
-- File: 1/1 Step: PARALLEL INDELS VCF/11
java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm INDEL -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 --max_alternate_alleles 6 -o INDELS_M2c.vcf -S LENIENT
Output file: INDELS_M2c.vcf Size: 9657 Time for stage: 0 days, 0 hours, 2 minutes and 9 seconds
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Mike,
I am happy to take a look if you want to upload the necessary files: BAM snippet plus the references files.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks. What is a BAM snippet?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •:) Just a smaller piece of your bam so that you don't need to upload (and we don't need to process) a huge file. You can use PrintReads with the -L argument to specify +/- 200bp around the region of interest.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •So the region of interest in this case is the place where we get the weird alignments? Do I upload it to an FTP site? If so where?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes. http://gatkforums.broadinstitute.org/discussion/1215/how-can-i-access-the-gsa-public-ftp-server
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Eric. I've uploaded the BAM files to the FTP server in a directory called mboursnell. I made smaller BAM files using PrintReads, but then the problem went away, so I had to upload the whole file. (The files starting T8 are the ones to use)
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Mike,
It seems that you've taken down the original IGV screenshot, so I don't know where your region of interest is. Could you please let me know?
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I think that must have been a forum bug; I restored it by editing the post to include the image itself (which was still attached, if hidden).
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Okay, I've figured out the problem: your reference file has an extra line feed character (ASCII 10) in it on line 29360. This corresponds to the base immediately after position chr1:1,761,540. The Picard library we use for reading reference fasta files is treating this as a valid reference character so that all bases following it are staggered by one position. Did you generate this reference yourself?
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Brilliant! What a detective. I'm not sure where the sequence came from. I'll ask our Bacteriology folks.....
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I cured this by writing a PERL script to re-write the reference file a character at a time, and insert new line-feeds. But I never managed to see the extra line-feed that you spotted, even though it must have been there. How did you reveal it?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •