It looks like you're new here. If you want to get involved, click one of these buttons!
For a complete, detailed argument reference, refer to the GATK document page here.
For a complete, detailed argument reference, refer to the GATK document page here.
While we advocate for using the Indel Realigner over an aggregated bam using the full Smith-Waterman alignment algorithm, it will work for just a single lane of sequencing data when run in -knownsOnly mode. Novel sites obviously won't be cleaned up, but the majority of a single individual's short indels will already have been seen in dbSNP and/or 1000 Genomes. One would employ the known-only/lane-level realignment strategy in a large-scale project (e.g. 1000 Genomes) where computation time is severely constrained and limited. We modify the example arguments from above to reflect the command-lines necessary for known-only/lane-level cleaning.
The RealignerTargetCreator step would need to be done just once for a single set of indels; so as long as the set of known indels doesn't change, the output.intervals file from below would never need to be recalculated.
java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R /path/to/reference.fasta \ -o /path/to/output.intervals \ -known /path/to/indel_calls.vcf
The IndelRealigner step needs to be run on every bam file.
java -Xmx4g -Djava.io.tmpdir=/path/to/tmpdir \ -jar /path/to/GenomeAnalysisTK.jar \ -I <lane-level.bam> \ -R <ref.fasta> \ -T IndelRealigner \ -targetIntervals <intervalListFromStep1Above.intervals> \ -o <realignedBam.bam> \ -known /path/to/indel_calls.vcf --consensusDeterminationModel KNOWNS_ONLY \ -LOD 0.4
Comments
Hi
Is there any way to run the IndelRealigner on BAM/SAM files where the multiple possible alignments are reported for a read, rather than the Top Random alignment that BWA uses?
As the aligner we are using generates BAMs which report multiple possible alignments for reads, that have no unique best alignment. ie if a Read has three equal likely alignments (a repeat for example) the aligner will report all three and flag them as non unique.
However if we try to feed these BAMs into GATK IndelRealigner we get an error due to these duplicate reads.
ERROR MESSAGE: Error caching SAM record HWI-ST196R_0530:2:2204:3691:99597#0/1, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
Is there any way around this, an on the fly filtering option we can add to GATK to discard these reads, or get it to work properly with them?
Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there, I'm pretty sure we already have a read filter that will work for you: try the NotPrimaryAlignmentFilter ('-rf NotPrimaryAlignment'). See the Read Filters section for more details. And if this doesn't work for you, it should be extremely easy for you to create a filter modeled after this one that will work. Good luck!
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That seems to have worked, thanks! Might be worth extending your error message to mention it as an option.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hello. I am trying to run the Indel Realigner but the message I get is:
The command line I am using is:
There are no errors from the previous step of RealignerTargetCreator. What am I doing wrong?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Sorry. Above command has -R reference.fasta, not -T as shown.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The error message says it all: your interval list is malformed. Look closely at the Genome Loc string at the end of the error:
chr1:58879-59946:95-193and ask yourself, what is this supposed to represent, and what's wrong about it?Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I am sorry but I don't know what's wrong other than it's obviously not a valid genomic coordinate in the usual sense. Since this was output by RealignerTargetCreator I assumed it would be correctly interpreted by IndelRealigner. I have a contig in my reference that has coordinates chr1:58879-59946 but I do not know what the 95-193 refers to.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Alright, well the
95-193part shouldn't be there at all, it's not part of the normal makeup of a genome loc, so it seems the interval list was not created correctly by the RTC. Please realize that a more accurate description of your problem would have been "I just ran RealignerTargetCreator and it produced an interval list that looks like it's incorrectly formatted". Technically this is a problem that should be posted under RealignerTargetCreator, not IndelRealigner...Can you check (by using
-XLto exclude the problem location) whether this only occurs a single time in your intervals file, or whether there are other instances?Another helpful test would be to run your original RTC command again to see if it produces the same malformed locus a second time. This could have been due to a one-time hiccup, so let's check for that before going too deep.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I agree with Geraldine that it looks like a one time blip with your file system.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The whole interval list file looks the same. Some contigs have multiple entries, e.g:
I did rerun the RTC command and the output was the same.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •This is weird -- can you please post your full command line for the RealignerTargetCreator step?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I have attached the output of RealignerTargetCreator.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •What happens if you run that command without the
--validation_strictnessflag?Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Also, can you tell me what reference you are using? Is it one of the standard ones from our bundle or is it something else? It looks like the locations are formatted in a way that's not compatible with our tools. If you upload your reference .idx file we may be able to verify if that's where the problem is.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Here is a problem related to realigning around know indel sites that I cannot solve:
After realigning around known indel sites (using the 1000G phase1 vcf file file from your bundle, as well as different copy of the same 1000G phase1 known indels vcf file), I notice some strange results when looking at plots of empirical and observed base-level phred scores pre- and post-indel-realignment. The attached PDF file shows side-by-side empirical v. reported phred score plots generated from both a raw bam file and a bam file after local indel realignment. These bam files are generated from the same sample, but you can see that after indel realignment there is a substantial deflation of empirical phred scores compared to pre-realignment results.
My question is: why do we see such results? The trend appears similar (pre- and post-realignment), but the empirical scores appear capped at ~25. I'm still attempting to better understand what might be happening here, but something about the way I am running indel realignment appears to also be impacting downstream base quality re-calibration results as well (similar, strange empirical vs. reported plot results post-recalibration). If I exclude the indel realignment step, the base quality re-calibration appears to work successfully (a much tighter adherence to the expected empirical v. reported line post-recalibration).
The code I used to carry out the two-stage indel realignment step is as follows:
/usr/bin/java -Xmx2g -jar /net/amd/hovelson/bin/GenomeAnalysisTK-1.4-13-gef50e77/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /net/amd/hovelson/umake/ref/hs37d5.fa.gz -I /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.dedup.bam -o /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.bam_indels.intervals -L /net/amd/hovelson/GSK/studydata/bed/SureSelect_All_Exon_50mb.hg19.sites.XY.unique_exons.bed -known /net/fantasia/home/gjun/data/indel/1000G_phase1.indels.b37.vcf/usr/bin/java -Xmx4g -jar /net/amd/hovelson/bin/GenomeAnalysisTK-1.4-13-gef50e77/GenomeAnalysisTK.jar -T IndelRealigner -R /net/amd/hovelson/umake/ref/hs37d5.fa.gz -targetIntervals /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.bam_indels.intervals -L /net/amd/hovelson/GSK/studydata/bed/SureSelect_All_Exon_50mb.hg19.sites.XY.unique_exons.bed -I /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.dedup.bam -o /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.indel_realign.bamsamtools index /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.indel_realign.bamI've spent a lot of time trying to troubleshoot this to no avail. I hope I'm just missing something simple, but am very eager to see whether you folks have any thoughts about what might be going wrong!
Best, Dan
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Dan,
That isn't the behavior I would have expected. A few thoughts: 1. What do the AnalyzeCovariates (since you are using an older version of the code) plots look like? Can you confirm that they show the same behavior? 2. What if you were to remove the -L argument in the IndelRealigner command? Does the plot for the realigned bam look better?
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks very much for your reply, Eric.
I assume by AnalyzeCovariates plots you mean plots after base quality recalibration (is this correct?). If I run recalibration after local indel realignment, I get the very odd plot you see on the lefthand-side of the attached PDF doc. If I run recalibration without any local indel realignment, I get the more sensible plot you see on the right side of the attached doc.
I will re-run local realignment without supplying the target region file and let you know what happens shortly.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •No, this isn't right. I don't want you to generate these plots with your own tools, but rather I want to see what they look like with our plotting tools. Perhaps you should upgrade to the latest release of the GATK and use BaseRecalibrator which is documented (and supported).
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'll work to generate these plots with your tools. I am a little unclear, though: how will proceeding with base quality recalibration help us sort out the unexpected results from the local indel realignment?
Thanks, Dan
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •For what it's worth, our tools do still show the same deflation in empirical phred score values for bam files created from local realignment when removing the -L target file option.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I just want to confirm that the problem is not with your plotting tools.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok, thanks Eric! I've updated to the latest release of GATK, and am currently re-running indel realignment and base quality recalibration on this sample. I'll generate the QQ plots using your tools, and let you know what I find.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •You shouldn't need to rerun the re-alignment - just the recalibration.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •So (using the latest GATK release) the target interval creation step (for local realignment) ran smoothly. However, when running IndelRealigner I receive a stack trace error that is as follows:
The code used to generate this is as follows (I tried with both a -L genomic interval file and without):
/usr/bin/java -Xmx4g -jar /net/amd/hovelson/bin/GenomeAnalysisTKLite-2.1-13-g0f021e6/GenomeAnalysisTKLite.jar -T IndelRealigner -R /net/amd/hovelson/umake/ref/hs37d5.fa.gz -targetIntervals /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.bam_indels.intervals -I /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.dedup.bam -o /net/amd/hovelson/GSK/studydata/NEAVAbams/bam/KP_083_TAGCTT_L008.indel_realign.bamAny thoughts?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I noticed you are using a BED interval file. Have you checked the coordinates/ indexing? BED files are indexed differently, this may just be an "off by one" error.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I had wondered the same, except I've tested the local realignment without using the BED interval file and still receive the same error. The BED file I have been using has start position as 0-based, end position as 1-based.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It looks like your BAM file is malformed. Can you please run it through Picard's ValidateSAMFile?
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •ValidateSAMFile Results:
I have >50 million reads in the BAM file, and ~98,000 records are flagged with an error essentially the same as below:
ERROR: Record 1234, Read name <READ_NAME>, Mate negative strand flag does not match read negative strand flag of mateThere's also ~50 errors that include one of the two following errors:
ERROR: .....MAPQ should be 0 for unmapped readorERROR: .....Mate unmapped flag does not match read unmapped flag of mateIs this type of mismatched read-pair information a likely culprit for causing problems at the indel realignment stage?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •And before I forget - THANK YOU very much for your help/suggestions in trying to sort all this out. I sincerely appreciate it!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •In playing around a little bit with some of these command line arguments, I realized that when running BaseRecalibrator (for example) with an unzipped reference FASTA file, I was able to get it to run just fine. If I run with a gzipped reference file - I get the same sorts of stack error problems described above for the IndelRealigner.
Could a gzipped reference file be the problem here?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes, that looks like the culprit.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Okay - it's still a bit strange, though, because I can run the first step of local indel realignment without issue. It only presents problems for the 2nd step of indel realignment (or BaseRecalibrator).
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •No, actually we are moving away even allowing zipped reference files at all, because the code we use to read it doesn't work so properly...
-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Okay - thanks very much for the help/feedback with this issue. It's greatly appreciated!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •The IndelRealigner works fine with one know indel target intervals. However, when I try to use two known indel target intervals with two separate -targetintervals arguments and -two separate -known arguments, I get the error of too many values for the -targetintervals argument. Should I run the IndelRealigner for each known target intervals separately or there is a way to combine them in a single run.Thanks,
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •You can give it a file containing a list of target intervals as explained here:
http://www.broadinstitute.org/gatk/guide/article?id=1204
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Just combine (e.g. with 'cat') the files into a single file first.
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I tried to use the MergeIntervalLists walker before, but it seems this walker is not included in the public release of the GATK. If I use cat for combining the intervals, then I will have many overlapping intervals in the final list. Is that ok for indelrealigner walker? Do I need to combine the vcf files as well or I can use them with two separate -known arguments?Thanks,
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Actually, let's step back for a second. Why do you have 2 input target interval lists? That's not right. You should only have one input list, otherwise you have done something wrong...
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Good question. Thank you for noting that. I do not know why I made a separate interval list for each vcf file. I think that I should use the two vcf files together with RealignerTargetCreator walker and make a combined interval list for both vcf files, shouldn't I?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes!
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm working on a pipeline for a very high coverage targeted resequencing project using GATK 2.2. We are interested only in variants within the targeted region. I tested two pipelines one with
-L targeted_regions.bedfor IndelRealigner, RealignerTargetCreator, BaseRecalibrator, PrintReads and one without-Lfor those steps. Out of about 430 variants, there are 3 variants that are discordant between the two variant sets (all three with >30 QUAL and >100 DP).Which set should I trust? is
-Lnecessary for such targeted resequencing projects? or I should avoid it for certain commands.- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
There is no strong argument for either side (and what you're seeing is a very minor difference) but I can tell you that we don't use intervals until the PrintReads step (after Base Recalibration).
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi,
I just recently started using GATK 2. I tried but I couldn't quite figure out where to find answers for my questions from the documentations, so I'll just aks straight up.
I'm working with 2 different Gasterosteus aculeatus lineages. Even though there is a genome, I haven't found a GVF file for the species. The sequences are from illumina. I've following throught the pipeline for local realignment with creating targets and then realigning followed with baserecalibration. I ran into few questions:
1) When I ran RealignerTargetCreator with a VCF of my own for these samples, I endedn up having in addition to my previously found indels (from samtools) alot of other targets as well. Some of them did seem like candidates even without an indel, but some of them were fine to my eye. What could be causing this? This is what I ran java -jar ./GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R gasAcu_combinedbac_inv7.fa -I readgroup_FF1_inv7c.sorted.bam -L Capture_Target_Regions.intervals -o intervals2_FF1_inv7c -known FF1_inv7c_var.vcf -fixMisencodedQuals
2) When I ran IndelRealigner using the targets created in the previous, I couldn't any more add my vcf as the --knownAlleles. It ran fine without it though, but when I added it in as an argument it errored with this:
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Argument with name 'knownAlleles' isn't defined.
ERROR ------------------------------------------------------------------------------------------
The code I tried running was java -jar ./GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T IndelRealigner -R gasAcu_combinedbac_inv7.fa -I readgroup_FF1_inv7c.sorted.bam -targetIntervals intervals2_FF1_inv7c.intervals -L Capture_Target_Regions.intervals -knownAlleles FF1_inv7c_var.vcf -o realigned_FF1_inv7c.bam -fixMisencodedQuals
3) I'm know at the stage of BaseRecalibrator, but my problem is that I'm not sure what to add as the -knownSites based on previous questions and the fact that I know there are samples for which I have the case of SNPs being Ref/Alt and Atl1/Alt2. I wasn't sure after reading the documentation, should I use -knownSites to mask potential SNPs from recalibration and if yes, then can it be my own VCF based on unrealigned and uncalibrated samtools results or should it be something else?
I appreciate any suggestions here.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •1) Don't worry about it. There are various things the RTC looks for when determining what regions may need to be realigned. Even if a region is flagged that doesn't really need to be, it won't hurt or mess up anything.
2) The argument name is slightly different for IR than for RTC:
--knownAlleles / -known. Always worth looking up the exact arg names in the tech docs if you're getting an error.3) You should probably filter your knowns VCF to keep only the high quality calls, otherwise you risk over-masking. This is covered in the Best Practices documentation.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for your reply Geraldine, I'll check my argument names and filter my VCF before running the BaseRecalibrator.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I had a conceptual question regarding "strand discordant loci." In the local realignment video tutorial, there was a slide on this concept whereby a unique albeit artefactual SNP cluster seems to be biased towards the ends of either forward or reverse strands due to the presence of an indel. It is not quite clear to me why exactly there is this strand bias. Let's say we have two identical reads (same sequence) which map to the edge of an indel, but one read is read from the forward strand (sequencing towards the indel let's say) and then another read is read from the reverse strand (sequencing away from the indel let's say). Are these two reads not equally likely to map to the same position on the indel? Does it have something to do with the fact that bases at the end of the reads tend to have lesser qualities (and thus a greater chance of mapping to an indelic region with artefactual mismatches?)
Regardless of this strand discordance, I understand that a high SNP density is also a huge flag for "suspicious" regions requiring realignment but I'd like to understand where this strand bias comes from.
Thanks,
MC
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
This is an interesting question, but it's more so a question on how aligners work rather than how to use the GATK, so I'm afraid we can't spend time discussing it. I think you would do better to start that discussion somewhere like SeqAnswers.
That said if someone in the community would like to discuss this with you they should feel free to do so here; we're quite happy to host the discussion even if we don't have time to participate.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there, Is it possible to perform RealignerTargetCreator and IndelRealigner with multiple samples at once in the same BAM file or does it need to be done one by one? In case it is possible to perform these steps with multiple samples at once, is there an optimal number of samples? Thanks very much in advance.
S.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi @bioSG,
You can realign multiple samples at the same time, yes. They can be in the same file or in separate files that you pass in together, as you prefer.
There is no hard limit to the number of samples you can realign together, but it is computationally expensive, so the number you can realistically process depends on the capabilities of your computer/platform.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi,
In case I run the realigner with multiple samples at once, will it try to find a consistent realignment taking all the samples at a certain site into account or will it treat the samples independently?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes, it will apply the same consensus to all the samples.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Going back to my question on the realigner with multiple samples at once, is there the risk of not performing the realignment in regions that would have been realigned doing a one by one sample realignment? It looks like realigning with multiple samples at once could decrease the percentages of mismatches at a locus having high entropy, and thus, further tuning of the entropy threshold would be needed.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •That's right, which is why if you want to be really thorough you can first realign per-sample, then do a second round of realignment on multiple samples. That way you're sure that individual samples are realigned internally even if the region doesn't get triggered when running multi-samples. But that takes a whole lot of time & compute, so it's not very popular.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Is there anyway to reorder the contigs for the known indels the Broad provides in their hg19 bundle? The hg19 reference genome I have been using starts chr1, chr2, chr3...chr7, chrX, chr8. However, the known indel contigs (1000G and Mills set) start chrM, chr1, chr2...chr7, chr8, chr9. I am trying to run RealignerTargetCreator (and later IndelRealigner) and I get an error stating that GATK doesn't function with different contig orders, it is "unsafe."
I would much rather reorder the contigs for the known indel VCF once than having to reorder all my BAM files everytime. Or is there another known indel VCF with the chr1, chr2, chr3...chr7, chrX, chr8 order I can download?
Thanks!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I'm afraid our hg19 resource files all start with chrM too. Maybe someone else has a differently ordered vcf they can share with you; otherwise we have heard of a script tool called VCFsorter that will allow you to reorder the contigs in your VCF according to a dictionary, using Picard: https://code.google.com/p/vcfsorter/
Just keep in mind that this is not one of our tools so we can't guarantee results nor provide support for that particular tool. Good luck!
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •As long as the -known [file] is in VCF format, will the RealignerTargetCreator function properly, regardless of chromosomal order? I want to make sure this will not hinder GATKs functionally.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •No, the order of the contigs has to be the same for all the files involved, otherwise GATK will freak out. This is an important requirement of how the GATK engine retrieves data from the files.
Generally speaking, you should always use data resources that match the reference. From when you start designing your experiment, you should plan which reference you will use for alignment to go with which data resources you will use. Choosing afterwards leads to this kind of problem of having to reorder things around, it's messy. Sorry to be the bearer of bad news...
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, I used the RealignerTargetCreator to create a list file (sample.intervals) with intervals to be used with the IndelRealigner. But when I call the command I get a error saying that my argument 'targetIntervals' has to many values. There is only one argument given to this parameter:
java -jar GenomeAnalysisTK.jar -T IndelRealigner -I ... -o ... -targetIntervals sample.intervalsDo you know some possible work around?
Thanks.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi @mjaeger,
Can you please post the full command that you ran?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Yes of cause.
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -I results/exom/HOCsampleA.bam -o results/exom/HOCsampleA_realigned.bam -targetIntervals sample.intervalsHere is the head of the intervals file:
chrM:302-303 chrM:516-517 chr1:132032 chr1:448309 chr1:523157-523158 chr1:533066 chr1:535082 ...The file contains about 62.000 positions/lines.
Thanks. Marten
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Marten,
That's very odd. You could try running this command again with a copy of the targets file containing only a few lines, and see if that works properly. If it does, it might mean that something is wrong in the original targets file. If it does not, then the problem is elsewhere. Can you also make sure that you are running the latest version of the GATK?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine,
The latest version of GATK (2.5-2-gf57256b) was downloaded the same day I first run the realigner. I tried to run the command with a copy with only the first 1.000 lines of the intervals file but got the same error message. So maybe I will rerun the analysis on an other machine.
Marten
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •can you copy & paste the GATK error message? There shouldn't be a limit on the number of intervals.
Mauricio Carneiro, PhD http://www.broadinstitute.org/~carneiro/
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •