FastaAlternateReferenceMaker headers aren't the same as reference

MaayaanKMaayaanK VancouverMember

I'm running FastaAlternateReferenceMaker in the following way:

java-1.7.0-u13/bin/java -jar GenomeAnalysisTK.jar -R GRCh37-lite.fa -T FastaAlternateReferenceMaker -o alt_ref1.fa --variant /RNA-seq_simulation/Illumina_body_map/HCT20170/ERS025093_5_lanes_dupsFlagged.sorted.vcf
I was expecting the fasta headers to be the same as the ones in the reference fasta (ie. the chromosome names and some extra info):

    1 CM000663.1 Homo sapiens chromosome 1, GRCh37 primary reference assembly
    2 CM000664.1 Homo sapiens chromosome 2, GRCh37 primary reference assembly
    3 CM000665.1 Homo sapiens chromosome 3, GRCh37 primary reference assembly
    ...
    22 CM000684.1 Homo sapiens chromosome 22, GRCh37 primary reference assembly
    X CM000685.1 Homo sapiens chromosome X, GRCh37 primary reference assembly
    Y CM000686.1 Homo sapiens chromosome Y, GRCh37 primary reference assembly, with PAR regions masked with Ns (bases 10001..2649520 & 59034050..59363566)
    MT J01415.2 Homo sapiens mitochondrion, complete genome
   GL000207.1 Homo sapiens chromosome 18 unlocalized genomic contig, GRCh37 reference primary assembly

but instead they are simply numbered from 1 to 84. Is there a way to keep the original chromosome names in the new reference file?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, unfortunately FastaAlternateReferenceMaker cannot do this. It is a fairly crude tool. If someone wanted to improve it by adding better handling of contig names, we'd be happy to look at a patch.

Sign In or Register to comment.