The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Very odd results - maybe due to IndelRealigner

edited December 2012 in Ask the GATK team

Hi. I am getting VERY odd results with some Streptococcus equi sequence. The BAM files from BWA align well in IGV, but when I run them through your pipeline there are many local errors where it seems that a single indel has been incorrectly multiplied up - somehow. You need to see the IGV screenshot.!

The bottom is a BAM file from BWA and the top is the final one from the GATK pipeline.

image

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Mike,

    Can you tell me what version you are using and what is your command-line?

  • The Genome Analysis Toolkit (GATK) v2.2-8-gec077cd, Compiled 2012/11/14 16:10:59

    Command line as in attached log file.

    Thanks for helping on this....

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hey Mike, I'm not seeing an attached file. It might have been a format that the server rejects. If you can just paste your command line that would be fine.

  • Hi Geraldine. Sorry about that. Here it is pasted.

    COMMAND LOG for NGS pipeline version 18

    Analysis name: M2c

    PREFERENCES:

    Reference sequence: /home/genetics/strep_equi/strep_equi.fasta
    Name of this analysis: M2c
    E-mail address: mike.boursnell@aht.org.uk
    Data type: PE
    Memory setting: -Xmx4g

    UnifiedGenotyper constants:

    -stand_emit_conf:           30
    -stand_call_conf:           30
    

    Make parallel VCF: parallel

    DATA FILES :

    Number of samples to analyse: 1

    Bam file 1 M2_Appleby_f2b.bam

    --
    File: 1/1 Step: 1/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/AddOrReplaceReadGroups.jar I=M2_Appleby_f2b.bam O=M2c_M2b_Appleby_rg.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true RGID=M2b_Appleby RGLB=s_equi RGPL=illumina RGPU=M2b_Appleby RGSM=M2b_Appleby

    Output file: M2c_M2b_Appleby_rg.bam Size: 151098168
    Time for stage: 0 days, 0 hours, 0 minutes and 40 seconds

    --
    File: 1/1 Step: 2/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/MarkDuplicates.jar I=M2c_M2b_Appleby_rg.bam O=M2c_M2b_Appleby.dedup.bam VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true M=M2c_M2b_Appleby.metrics

    Output file: M2c_M2b_Appleby.dedup.bam Size: 151179932
    Time for stage: 0 days, 0 hours, 0 minutes and 43 seconds

    --
    File: 1/1 Step: 3/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -o M2c_M2b_Appleby.dedup.bam.intervals -mismatch 0.0 -S LENIENT

    Output file: M2c_M2b_Appleby.dedup.bam.intervals Size: 48172
    Time for stage: 0 days, 0 hours, 0 minutes and 45 seconds

    --
    File: 1/1 Step: 4/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T IndelRealigner -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.dedup.bam -targetIntervals M2c_M2b_Appleby.dedup.bam.intervals -o M2c_M2b_Appleby.clean.dedup.bam -compress 0 -model USE_READS -S LENIENT

    Output file: M2c_M2b_Appleby.clean.dedup.bam Size: 506780456
    Time for stage: 0 days, 0 hours, 0 minutes and 48 seconds

    --
    File: 1/1 Step: 5/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/genetics/strep_equi/strep_equi.fasta -I M2c_M2b_Appleby.clean.dedup.bam -knownSites /home/genetics/strep_equi/strep_equi_dummy_DBSNP.vcf -o M2c_M2b_Appleby.recal.grp

    Output file: M2c_M2b_Appleby.recal.grp Size: 1225814
    Time for stage: 0 days, 0 hours, 12 minutes and 28 seconds

    --
    File: 1/1 Step: 6/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -T PrintReads -I M2c_M2b_Appleby.clean.dedup.bam -R /home/genetics/strep_equi/strep_equi.fasta -BQSR M2c_M2b_Appleby.recal.grp -o M2c_M2b_Appleby.clean.dedup.recal.bam -S LENIENT

    Output file: M2c_M2b_Appleby.clean.dedup.recal.bam Size: 244997963
    Time for stage: 0 days, 0 hours, 3 minutes and 22 seconds

    --
    File: 1/1 Step: 7/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/ValidateSamFile.jar INPUT=M2c_M2b_Appleby.clean.dedup.recal.bam OUTPUT=M2c_M2b_Appleby_validate.out VALIDATION_STRINGENCY=SILENT MODE=SUMMARY MAX_OUTPUT=100 MAX_OPEN_TEMP_FILES=7900

    Output file: M2c_M2b_Appleby_validate.out Size: 16
    Time for stage: 0 days, 0 hours, 0 minutes and 13 seconds

    --
    File: 1/1 Step: 8/11
    /opt/samtools/samtools flagstat M2c_M2b_Appleby.clean.dedup.recal.bam > M2c_M2b_Appleby_flagstat.out

    Output file: M2c_M2b_Appleby_flagstat.out Size: 391
    Time for stage: 0 days, 0 hours, 0 minutes and 4 seconds

    --
    File: 1/1 Step: GC_BIAS/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectGcBiasMetrics.jar REFERENCE_SEQUENCE=/home/genetics/strep_equi/strep_equi.fasta INPUT=M2c_M2b_Appleby_final.bam O=out1.junk CHART_OUTPUT=M2b_Appleby_gc_bias.pdf VALIDATION_STRINGENCY=LENIENT

    Output file: M2b_Appleby_gc_bias.pdf Size: 19783
    Time for stage: 0 days, 0 hours, 0 minutes and 11 seconds

    --
    File: 1/1 Step: SIZE_METRICS/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/picard/CollectInsertSizeMetrics.jar INPUT=M2c_M2b_Appleby_final.bam O=out2.junk HISTOGRAM_FILE=M2b_Appleby_insert_size.pdf VALIDATION_STRINGENCY=LENIENT

    Output file: M2b_Appleby_insert_size.pdf Size: 32109

    --
    File: 1/1 Step: PARALLEL SNPS VCF/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm SNP -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 -o SNPS_M2c.vcf -S LENIENT

    Output file: SNPS_M2c.vcf Size: 6984668
    Time for stage: 0 days, 0 hours, 3 minutes and 10 seconds

    --
    File: 1/1 Step: PARALLEL INDELS VCF/11
    java -Xmx4g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T UnifiedGenotyper -glm INDEL -I results_M2c_1/M2c_M2b_Appleby_final.bam -stand_emit_conf 30 -stand_call_conf 30 --max_alternate_alleles 6 -o INDELS_M2c.vcf -S LENIENT

    Output file: INDELS_M2c.vcf Size: 9657
    Time for stage: 0 days, 0 hours, 2 minutes and 9 seconds

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hi Mike,

    I am happy to take a look if you want to upload the necessary files: BAM snippet plus the references files.

  • Thanks. What is a BAM snippet?

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    :)
    Just a smaller piece of your bam so that you don't need to upload (and we don't need to process) a huge file. You can use PrintReads with the -L argument to specify +/- 200bp around the region of interest.

  • So the region of interest in this case is the place where we get the weird alignments? Do I upload it to an FTP site? If so where?

  • Hi Eric. I've uploaded the BAM files to the FTP server in a directory called mboursnell. I made smaller BAM files using PrintReads, but then the problem went away, so I had to upload the whole file. (The files starting T8 are the ones to use)

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hi Mike,

    It seems that you've taken down the original IGV screenshot, so I don't know where your region of interest is. Could you please let me know?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I think that must have been a forum bug; I restored it by editing the post to include the image itself (which was still attached, if hidden).

  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Okay, I've figured out the problem: your reference file has an extra line feed character (ASCII 10) in it on line 29360. This corresponds to the base immediately after position chr1:1,761,540. The Picard library we use for reading reference fasta files is treating this as a valid reference character so that all bases following it are staggered by one position.
    Did you generate this reference yourself?

  • Brilliant! What a detective. I'm not sure where the sequence came from. I'll ask our Bacteriology folks.....

  • I cured this by writing a PERL script to re-write the reference file a character at a time, and insert new line-feeds. But I never managed to see the extra line-feed that you spotted, even though it must have been there. How did you reveal it?

Sign In or Register to comment.