The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Very odd results - maybe due to IndelRealigner

mike_boursnellmike_boursnell Member Posts: 94
edited December 2012 in Ask the GATK team

Hi. I am getting VERY odd results with some Streptococcus equi sequence. The BAM files from BWA align well in IGV, but when I run them through your pipeline there are many local errors where it seems that a single indel has been incorrectly multiplied up - somehow. You need to see the IGV screenshot.!

The bottom is a BAM file from BWA and the top is the final one from the GATK pipeline.

image

Post edited by Geraldine_VdAuwera on
bmp
bmp
IGV picture 1.bmp
4M

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,970 admin

    Hi Mike,

    Can you tell me what version you are using and what is your command-line?

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Member Posts: 94

    The Genome Analysis Toolkit (GATK) v2.2-8-gec077cd, Compiled 2012/11/14 16:10:59

    Command line as in attached log file.

    Thanks for helping on this....

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,970 admin

    Hey Mike, I'm not seeing an attached file. It might have been a format that the server rejects. If you can just paste your command line that would be fine.

    Geraldine Van der Auwera, PhD

  • mike_boursnellmike_boursnell Member Posts: 94

    Hi Geraldine. Sorry about that. Here it is pasted.

    COMMAND LOG for NGS pipeline version 18

    Analysis name: M2c

    PREFERENCES:

    Reference sequence: /home/genetics/strep_equi/strep_equi.fasta
    Name of this analysis: M2c
    E-mail address: mike.boursnell@aht.org.uk
    Data type: PE
    Memory setting: -Xmx4g

    UnifiedGenotyper constants:

    -stand_emit_conf:           30
    -stand_call_conf:           30
    

    Make parallel VCF: parallel

    DATA FILES :

    Number of samples to analyse: 1

    Bam file 1 M2_Appleby_f2b.bam

    --
    File: 1/1 Step: 1/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/AddOrReplaceReadGroups.jar I=M2_Appleby_f2b.bam O=M2c_M2b_Appleby_rg.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true RGID=M2b_Appleby RGLB=s_equi RGPL=illumina RGPU=M2b_Appleby RGSM=M2b_Appleby

    Output file: M2c_M2b_Appleby_rg.bam Size: 151098168
    Time for stage: 0 days, 0 hours, 0 minutes and 40 seconds

    --
    File: 1/1 Step: 2/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/MarkDuplicates.jar I=M2c_M2b_Appleby_rg.bam O=M2c_M2b_Appleby.dedup.bam VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true M=M2c_M2b_Appleby.metrics

    Output file: M2c_M2b_Appleby.dedup.bam Size: 151179932
    Time for stage: 0 days, 0 hours, 0 minutes and 43 seconds

    --
    File: 1/1 Step: 3/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -o M2c_M2b_Appleby.dedup.bam.intervals -mismatch 0.0 -S LENIENT

    Output file: M2c_M2b_Appleby.dedup.bam.intervals Size: 48172
    Time for stage: 0 days, 0 hours, 0 minutes and 45 seconds

    --
    File: 1/1 Step: 4/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T IndelRealigner -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -targetIntervals M2c_M2b_Appleby.dedup.bam.intervals -o M2c_M2b_Appleby.clean.dedup.bam -compress 0 -model USE_READS -S LENIENT

    Output file: M2c_M2b_Appleby.clean.dedup.bam Size: 506780456
    Time for stage: 0 days, 0 hours, 0 minutes and 48 seconds

    --
    File: 1/1 Step: 5/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.clean.dedup.bam -knownSites /home/genetics/strep_equi/strep_equi_dummy_DBSNP.vcf -o M2c_M2b_Appleby.recal.grp

    Output file: M2c_M2b_Appleby.recal.grp Size: 1225814
    Time for stage: 0 days, 0 hours, 12 minutes and 28 seconds

    --
    File: 1/1 Step: 6/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T PrintReads -I M2c_M2b_Appleby.clean.dedup.bam -R /home/genetics/strep_equi/strep_equi.fasta -BQSR M2c_M2b_Appleby.recal.grp -o M2c_M2b_Appleby.clean.dedup.recal.bam -S LENIENT

    Output file: M2c_M2b_Appleby.clean.dedup.recal.bam Size: 244997963
    Time for stage: 0 days, 0 hours, 3 minutes and 22 seconds

    --
    File: 1/1 Step: 7/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/ValidateSamFile.jar INPUT=M2c_M2b_Appleby.clean.dedup.recal.bam OUTPUT=M2c_M2b_Appleby_validate.out VALIDATION_STRINGENCY=SILENT MODE=SUMMARY MAX_OUTPUT=100 MAX_OPEN_TEMP_FILES=7900

    Output file: M2c_M2b_Appleby_validate.out Size: 16
    Time for stage: 0 days, 0 hours, 0 minutes and 13 seconds

    --
    File: 1/1 Step: 8/11
    /opt/samtools/samtools flagstat M2c_M2b_Appleby.clean.dedup.recal.bam > M2c_M2b_Appleby_flagstat.out

    Output file: M2c_M2b_Appleby_flagstat.out Size: 391
    Time for stage: 0 days, 0 hours, 0 minutes and 4 seconds

    --
    File: 1/1 Step: GC_BIAS/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectGcBiasMetrics.jar REFERENCE_SEQUENCE=/home/genetics/strep_equi/strep_equi.fasta INPUT=M2c_M2b_Appleby_final.bam O=out1.junk CHART_OUTPUT=M2b_Appleby_gc_bias.pdf VALIDATION_STRINGENCY=LENIENT

    Output file: M2b_Appleby_gc_bias.pdf Size: 19783
    Time for stage: 0 days, 0 hours, 0 minutes and 11 seconds

    --
    File: 1/1 Step: SIZE_METRICS/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectInsertSizeMetrics.jar INPUT=M2c_M2b_Appleby_final.bam O=out2.junk HISTOGRAM_FILE=M2b_Appleby_insert_size.pdf VALIDATION_STRINGENCY=LENIENT

    Output file: M2b_Appleby_insert_size.pdf Size: 32109

    --
    File: 1/1 Step: PARALLEL SNPS VCF/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm SNP -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 -o SNPS_M2c.vcf -S LENIENT

    Output file: SNPS_M2c.vcf Size: 6984668
    Time for stage: 0 days, 0 hours, 3 minutes and 10 seconds

    --
    File: 1/1 Step: PARALLEL INDELS VCF/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm INDEL -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 --max_alternate_alleles 6 -o INDELS_M2c.vcf -S LENIENT

    Output file: INDELS_M2c.vcf Size: 9657
    Time for stage: 0 days, 0 hours, 2 minutes and 9 seconds

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Hi Mike,

    I am happy to take a look if you want to upload the necessary files: BAM snippet plus the references files.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mike_boursnellmike_boursnell Member Posts: 94

    Thanks. What is a BAM snippet?

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    :)
    Just a smaller piece of your bam so that you don't need to upload (and we don't need to process) a huge file. You can use PrintReads with the -L argument to specify +/- 200bp around the region of interest.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mike_boursnellmike_boursnell Member Posts: 94

    So the region of interest in this case is the place where we get the weird alignments? Do I upload it to an FTP site? If so where?

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mike_boursnellmike_boursnell Member Posts: 94

    Hi Eric. I've uploaded the BAM files to the FTP server in a directory called mboursnell. I made smaller BAM files using PrintReads, but then the problem went away, so I had to upload the whole file. (The files starting T8 are the ones to use)

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Hi Mike,

    It seems that you've taken down the original IGV screenshot, so I don't know where your region of interest is. Could you please let me know?

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,970 admin

    I think that must have been a forum bug; I restored it by editing the post to include the image itself (which was still attached, if hidden).

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Okay, I've figured out the problem: your reference file has an extra line feed character (ASCII 10) in it on line 29360. This corresponds to the base immediately after position chr1:1,761,540. The Picard library we use for reading reference fasta files is treating this as a valid reference character so that all bases following it are staggered by one position.
    Did you generate this reference yourself?

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • mike_boursnellmike_boursnell Member Posts: 94

    Brilliant! What a detective. I'm not sure where the sequence came from. I'll ask our Bacteriology folks.....

  • mike_boursnellmike_boursnell Member Posts: 94

    I cured this by writing a PERL script to re-write the reference file a character at a time, and insert new line-feeds. But I never managed to see the extra line-feed that you spotted, even though it must have been there. How did you reveal it?

Sign In or Register to comment.