Can you consider updating /bundle/2.8/b37/human_g1k_v37_decoy.fasta.fai.gz?

mmterpstrammterpstra NetherlandsMember
edited October 2015 in Ask the GATK team

Hi GATK team,

I found that using 'human_g1k_v37_decoy.fasta.fai' with samtools1.2 gives a floating point exception after a look at the fai indexes of human_g1k_v37_decoy.fasta.fai and human_g1k_v37.fasta.fai :

==> /apps/data//ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37.fasta.fai <==
1   249250621   52  60  61
2   243199373   253404903   60  61
3   198022430   500657651   60  61
4   191154276   701980507   60  61
5   180915260   896320740   60  61
6   171115067   1080251307  60  61
7   159138663   1254218344  60  61
8   146364022   1416009371  60  61
9   141213431   1564812846  60  61
10  135534747   1708379889  60  61

==> /apps/data//ftp.broadinstitute.org/bundle/2.8/b37//human_g1k_v37_decoy.fasta.fai <==
1 dna:chromosome chromosome:GRCh37:1:1:249250621:1  249250621   52  60  61
2 dna:chromosome chromosome:GRCh37:2:1:243199373:1  243199373   253404903   60  61
3 dna:chromosome chromosome:GRCh37:3:1:198022430:1  198022430   500657651   60  61
4 dna:chromosome chromosome:GRCh37:4:1:191154276:1  191154276   701980507   60  61
5 dna:chromosome chromosome:GRCh37:5:1:180915260:1  180915260   896320740   60  61
6 dna:chromosome chromosome:GRCh37:6:1:171115067:1  171115067   1080251307  60  61
7 dna:chromosome chromosome:GRCh37:7:1:159138663:1  159138663   1254218344  60  61
8 dna:chromosome chromosome:GRCh37:8:1:146364022:1  146364022   1416009371  60  61
9 dna:chromosome chromosome:GRCh37:9:1:141213431:1  141213431   1564812846  60  61
10 dna:chromosome chromosome:GRCh37:10:1:135534747:1    135534747   1708379889  60  61

I concluded that Samtools cannot parse this header. This is fixable with samtools faidx human_g1k_v37_decoy.fasta. Consider updating the bundle to avoid confusion.

So can you update human_g1k_v37_decoy.fasta.fai.gz / human_g1k_v37_decoy.fasta.fai.gz.md5 ?

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm sorry, I don't think I get what you're saying. What's the command you ran and the error you encountered, exactly?

  • buddejbuddej St. LouisMember

    I encountered the same issue (1 year later) with bcftools when trying to normalize a .vcf to the reference sequence, using the human_g1k_v37_decoy.fasta assembly from the Broad 2.8 bundle.

    @mmterpstra is correct, reindexing the .fasta with samtools faidx produces an .fai file which can be read by bcftools. I'm not sure if there is anything out of spec about this .fai, but it certainly appears to be nonstandard when compared to what samtools faidx generates.

    BTW, the human_g1k_v37.fasta.fai file in the same broad 2.8 bundle appears to be fine, and does not have the extra "dna:chromosome...etc. bits.

    Issue · Github
    by Sheila

    Issue Number
    1410
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    sooheelee
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @buddej @mmterpstra
    Hi,

    So, am I correct in understanding the human_g1k_v37_decoy.fasta.fai.gz and human_g1k_v37_decoy.fasta.fai.gz.md5 need to be replaced?

    Thanks,
    Sheila

  • buddejbuddej St. LouisMember

    @Sheila Yes, I believe if you re-generate them on your end with samtools, and replaces the files in the bundle, that will fix this issue

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    We're going to update the bundle in the near future to deal with this.
Sign In or Register to comment.