Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

IndelRealigner leading to downstream errors

So Geraldine tracked down a recent java exception down to a problem
with my data, however I'm finding that evidence for this problem only
occurs after the IndelRealigner step. The errors that we're seeing
are (as Geraldine pointed out):

Mate alignment does not match alignment start of mate 

and

Mate negative strand flag does not match read negative strand flag of mate

The command I'm using is:

java -Xmx100g -jar \
/wga/dev/neilw/noarch/pkgs/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar \
-T RealignerTargetCreator -nt 45  \
-I /wga/scr4/NA12878_calls/H01UJADXX/realign-bwa-mem/reverted.12.aligned.wholegenome.merged.sorted.marked.bam \
-R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
-o reverted.12.aligned.wholegenome.merged.sorted.marked.indel_cleaned_local.intervals \
--known /wga/scr4/bigrefs/human19/GATK/Mills_and_1000G_gold_standard.indels.b37.sites.vcf \
--known /wga/scr4/bigrefs/human19/GATK/1000G_phase1.indels.b37.vcf

Followed by:

java -Xmx20g -jar \
/wga/dev/neilw/noarch/pkgs/GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar \
-T IndelRealigner -U -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
-o reverted.12.aligned.wholegenome.merged.sorted.marked.indel_cleaned_local.1.bam \
-I /wga/scr4/NA12878_calls/H01UJADXX/realign-bwa-mem/reverted.12.aligned.wholegenome.merged.sorted.marked.bam \
-L 1  \
-targetIntervals reverted.12.aligned.wholegenome.merged.sorted.marked.indel_cleaned_local.intervals \
-compress 1 -model USE_SW -maxInMemory 100000000 -LOD 0.4 -known \
/wga/scr4/bigrefs/human19/GATK/Mills_and_1000G_gold_standard.indels.b37.sites.vcf \
-known /wga/scr4/bigrefs/human19/GATK/1000G_phase1.indels.b37.vcf

Note the "-L" in the IndelRealigner, but not in the RTC command. I
run the IndelRealigner separately for each chromosome, but the errors
are present in both the merged and unmerged files.

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Neil, when you say the error only occurs after the IR step, do you mean that if you validate the files beforehand, they pass, then fail validation afterwards?

  • neilwneilw Member

    @Geraldine_VdAuwera said:
    Neil, when you say the error only occurs after the IR step, do you mean that if you validate the files beforehand, they pass, then fail validation afterwards?

    Yes. Before IR, I see "No errors found" and after IR, a large proportion of the reads have errors. This was with 2.4-9-g532efad.

    Neil

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm. I see you're running IR in unsafe mode; does the issue also occur when you run with -U?

  • neilwneilw Member

    I'm glad that you noticed that, because the -U was inadvertent, but I'm afraid that the problem persists.

    I'll send you the pathname for a small test-set.

    Neil

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    H Neil,

    I'm trying to run tests on your file but my jobs keep hanging, not sure why. Maybe a transient issue with the server. In the meantime, can you please try your IR command again but without the "non-vanilla" parameters, i.e. without -compress 1 -model USE_SW -maxInMemory 100000000 -LOD 0.4? I'd like to rule out an effect of those parameters since they're not standard and therefore less systematically tested.

  • neilwneilw Member

    Same issue with:

    -T IndelRealigner -R /seq/refer
    ences/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta -o reverted.12.al
    igned.wholegenome.merged.sorted.marked.indel_cleaned_local.1.bam -I /wga/scr4/NA
    12878_calls/H01UJADXX/realign-bwa-mem/reverted.12.aligned.wholegenome.merged.sor
    ted.marked.bam -L 1:1-1000000 -targetIntervals reverted.12.aligned.wholegenome.m
    erged.sorted.marked.indel_cleaned_local.intervals -known /wga/scr4/bigrefs/human
    19/GATK/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -known /wga/scr4/bigr
    efs/human19/GATK/1000G_phase1.indels.b37.vcf

  • neilwneilw Member

    Okay, so just to report back here and recap:

    1. BQSR was dumping out because of a Java array index out of bounds exception
    2. The data didn't pass picard ValidateSamFile prior to BQSR, but that's a red herring.
    3. The data did pass picard ValidateSamFile prior to IndelRealigner, but that's also a red herring.

    The root cause of the issue is that bwa mem, in its default configuration, will split a read into multiple primary alignments, and many of the Picard tools have not yet been updated to handle this. Therefore, these files will not validate. Futhermore, using picard MergeBamAlignment will make a hash of things (not surprising), and this was the root causes of the problem. It's better for now just to build back in readgroup identifiers into the bam and forgo the rest fo the metadata. bwa mem now allows for specification of the readgroup (@RG) line and will add it back in for you.

    Thanks for your help.

    -Neil

Sign In or Register to comment.