The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

FastaAlternateReferenceMaker realigns intervals in which order?

I am using FastaAlternateReferenceMaker and have a set of intervals ordered first by chromosome and then by their start positions. I have tried ordering chromosomes alphabetically(chr1, chr10, chr11,..) as well as numerically (chr1, chr2, chr3...) but the output fasta sequence returned is not in the same order as listed in interval file. I find that even the names target_1, target_2 etc are also not used as fasta headers in the output file. I am stuck with mapping the input intervals with the output fasta sequences.
Thanks in advance for all the help,


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Ramya,

    The output sequence will always be returned in the same order, with the same contig names, as the master reference your data is aligned against. If you want that order to be different, you will need to reorder/rename contigs in your master reference first. Or modify your new reference after it is generated, of course.

  • RP1RP1 Member

    Thanks Geraldine. You are right that the sequence is always returned in the same order as the master reference. I figured that my problem was with overlapping intervals in the bed file. Overlapping intervals are NOT allowed and FastaAlternateReferenceMaker combines overlapping intervals into one while returning the output.I also observe something very peculiar. If there is a deletion overlapping with the boundaries of an interval, then it affects the sequence of the next interval.

    To clarify my point I am attaching some sample files in zipped version: (1) sample reference fasta file (sample_reference.fasta) and its indexes (sample_reference.fasta.fai, sample_reference.dict) (2) sample variant file (sample_variant.bcf) (3) two sample interval files (sample_interval.bed, sample_interval1.bed) and their corresponding output files (sample_interval.fasta, sample_interval1.fasta). The command I used is : java -Xmx2g -jar GenomeAnalysisTK-2.4-9-g532efad/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -l ERROR -R sample_reference.fasta -L sample_interval.bed -o sample_interval.fasta --variant sample_variant.bcf

    Kindly pay attention to the second interval in sample_interval.bed file whose output depends on its preceding interval's boundary overlapping with a deletion. In sample_interval1.bed file, if I give only single interval, then the output sequence is correct.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Well, overlapping intervals are allowed in the sense that they won't cause an error -- it's just that the engine will combine them before proceeding with the analysis. This is documented in the FAQ about intervals.

    Can you explain in a little more detail what is the issue you observe with the deletion affecting the next interval? I need you to tell me explicitly rather than show me the files at this stage, because unfortunately I don't have the time to go through multiple files trying to guess what I should be seeing.

  • RP1RP1 Member

    Dear Geraldine,

    There is an INDEL at position 110349 at which AAGGG in the reference is called as A in the alternate reference (file: sample_variant.bcf). When I give the intervals (file: sample_interval.bed) as

    Sbay_1 110346 110350 . target_1
    Sbay_1 110466 110470 . target_2

    The output fasta sequences are TAA and T respectively (file: sample_interval.fasta).

    Whereas if I give just the second interval in a different file (file: sample_interval1.bed)

    Sbay_1 110466 110470 . target_2

    The output fasta sequence is TCAT (file: sample_interval1.fasta).

    This happens because in the first case the previous interval (target_1) spans an INDEL. However if the end of the previous interval (i.e. target_1) is >=110353 (i.e 110349 + length(AAGGG) - 1), the effect on subsequent interval's output vanishes.

    Thus, in case an interval partially spans an INDEL, the next interval's output gets affected.Correct me if I am wrong.


  • Hello,

    I would just add that I am having a similar problem which could be solved if FastaAlternateReferenceMaker was given the name (target1, target2) as header of the fasta results file.
    Giving number as fasta header make it harder to find which sequences are overlapping (and which one are not processed ).
    By having the name of the interval in the output file it will make it easier to analyse.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Martin, that would be problematic for large sets of targets. It would be better to run some kind of validation on your targets first to check that there won't be any adverse interactions. Unfortunately we don't have the resources right now to devote to improving this tool; it is provided as-is.

Sign In or Register to comment.