Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Unknown reference error using FastaAlternateReferenceMaker

MartinBMartinB Posts: 8Member
edited September 2013 in Ask the team

Hello

I tried to run FastaAlternateReferenceMaker and I get the following error:

WARNING 2013-09-18 16:28:28 IntervalList    Ignoring interval for unknown reference: Chr1:3580210-3580286

For all the intervals I submitted. I already looked around on the web, and I did not find any answer, knowing that my chromosome names are all with the 'Chr' format in all the files and that my interval files are tab delimited.

My interval file look like:

@HD     VN:1.4  SO:unsorted
@SQ     SN:Chr1 LN:158337067    UR:file:chromosome_3.1.fasta    M5:0631b350aa263a0f714de8ba9d609eb0
@SQ     SN:Chr2 LN:137060424    UR:file:Chromosome_3.1.fasta    M5:15898469d6142f8bb74f769bfe9b155f
@SQ     SN:Chr3 LN:121430405    UR:file:Chromosome_3.1.fasta    M5:c515c4da7c2cd2d24c9487db8f733cfd
...
Chr1    3580210 3580286 +       ID=MI0011294_1;accession_number=MI0011294
Chr1    3580220 3580240 +       ID=MIMAT0011792_1;accession_number=MIMAT0011792
Chr1    3607747 3607842 -       ID=MI0014499_1;accession_number=MI0014499
Chr1    3607802 3607822 -       ID=MIMAT0017395_1;accession_number=MIMAT0017395
Chr1    10227277        10227339        -       ID=MI0009752_1;accession_number=MI0009752
Chr1    10227315        10227337        -       ID=MIMAT0009241_1;accession_number=MIMAT0009241
Chr1    19881347        19881431        -       ID=MI0005457_1;accession_number=MI0005457
Chr1    19881398        19881419        -       ID=MIMAT0003539_1;accession_number=MIMAT0003539
Chr1    19930459        19930542        -       ID=MI0005454_1;accession_number=MI0005454
Chr1    19930511        19930532        -       ID=MIMAT0004332_1;accession_number=MIMAT0004332
...

The header of my interval file is a copy of the Chromosome_3.1.dict I do not know what is misformated and why I get this error

Thanks

Martin

Post edited by Geraldine_VdAuwera on

Best Answers

Answers

  • MartinBMartinB Posts: 8Member

    Here is, more clearly, what the interval file look like:

    @HD VN:1.4 SO:unsorted

    @SQ SN:Chr1 LN:158337067 UR:file:chromosome_3.1.fasta M5:0631b350aa263a0f714de8ba9d609eb0\n

    @SQ SN:Chr2 LN:137060424 UR:file:Chromosome_3.1.fasta M5:15898469d6142f8bb74f769bfe9b155f

    @SQ SN:Chr3 LN:121430405 UR:file:Chromosome_3.1.fasta M5:c515c4da7c2cd2d24c9487db8f733cfd

    ...

    Chr1 3580210 3580286 + ID=MI0011294_1;accession_number=MI0011294

    Chr1 3580220 3580240 + ID=MIMAT0011792_1;accession_number=MIMAT0011792

    Chr1 3607747 3607842 - ID=MI0014499_1;accession_number=MI0014499

    Chr1 3607802 3607822 - ID=MIMAT0017395_1;accession_number=MIMAT0017395

    Chr1 10227277 10227339 - ID=MI0009752_1;accession_number=MI0009752

    Chr1 10227315 10227337 - ID=MIMAT0009241_1;accession_number=MIMAT0009241

    Chr1 19881347 19881431 - ID=MI0005457_1;accession_number=MI0005457

    Chr1 19881398 19881419 - ID=MIMAT0003539_1;accession_number=MIMAT0003539

    Chr1 19930459 19930542 - ID=MI0005454_1;accession_number=MI0005454

    Chr1 19930511 19930532 - ID=MIMAT0004332_1;accession_number=MIMAT0004332

    ...

    Martin

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,230Administrator, GSA Member admin

    Hi Martin,

    Can you please post the command line you're using?

    And have you tried passing just a single interval from command line (e.g. -L Chr1:3580210-3580286) to see if that works properly?

    Geraldine Van der Auwera, PhD

  • MartinBMartinB Posts: 8Member

    Hello,

    I just try to write directly the interval and it worked perfectly.

    The command line I used is:

    java -Xmx2g -jar ~/Documents/Programms/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar -R Chromosome_3.1.fasta -T FastaAlternateReferenceMaker -o chr1_test_variant.fasta -L chr1_variant.intervals --variant indels_v1.Chr1.vcf

    Thanks, Martin

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,230Administrator, GSA Member admin

    I see. Then try deleting the header from your intervals file in case that's what's messing up the parsing.

    Geraldine Van der Auwera, PhD

  • MartinBMartinB Posts: 8Member

    Without the header I get this error:

    Badly formed genome loc: Contig Chr1 3580210 3580286 + ID=MI0011294' does not match any contig in the GATK sequence dictionary derived from the reference

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,230Administrator, GSA Member admin

    Eh, right, without the header it doesn't know how to parse the rest. Not sure why it's not recognizing the Picard format in the first place when it has the header. On the off chance -- can you try running with your original intervals file (with header), but rename the extension to .list?

    Geraldine Van der Auwera, PhD

  • MartinBMartinB Posts: 8Member

    I think I already try that, I just try it agin and I get the same error that before with the header.

    I made up the interval file myself from a gff file, could it cause the problem?

  • MartinBMartinB Posts: 8Member

    Hello,

    The last format you suggested in the inter

  • MartinBMartinB Posts: 8Member

    Hello again,

    Just to say that I find the problem with the first tab separated value file for interval, it is parsed correctly with the extension .bed.

    Thanks again for all the help!

    Martin

Sign In or Register to comment.