Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

FastaAlternateReferenceMaker headers aren't the same as reference

MaayaanKMaayaanK VancouverMember

I'm running FastaAlternateReferenceMaker in the following way:

java-1.7.0-u13/bin/java -jar GenomeAnalysisTK.jar -R GRCh37-lite.fa -T FastaAlternateReferenceMaker -o alt_ref1.fa --variant /RNA-seq_simulation/Illumina_body_map/HCT20170/ERS025093_5_lanes_dupsFlagged.sorted.vcf
I was expecting the fasta headers to be the same as the ones in the reference fasta (ie. the chromosome names and some extra info):

    1 CM000663.1 Homo sapiens chromosome 1, GRCh37 primary reference assembly
    2 CM000664.1 Homo sapiens chromosome 2, GRCh37 primary reference assembly
    3 CM000665.1 Homo sapiens chromosome 3, GRCh37 primary reference assembly
    ...
    22 CM000684.1 Homo sapiens chromosome 22, GRCh37 primary reference assembly
    X CM000685.1 Homo sapiens chromosome X, GRCh37 primary reference assembly
    Y CM000686.1 Homo sapiens chromosome Y, GRCh37 primary reference assembly, with PAR regions masked with Ns (bases 10001..2649520 & 59034050..59363566)
    MT J01415.2 Homo sapiens mitochondrion, complete genome
   GL000207.1 Homo sapiens chromosome 18 unlocalized genomic contig, GRCh37 reference primary assembly

but instead they are simply numbered from 1 to 84. Is there a way to keep the original chromosome names in the new reference file?

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    No, unfortunately FastaAlternateReferenceMaker cannot do this. It is a fairly crude tool. If someone wanted to improve it by adding better handling of contig names, we'd be happy to look at a patch.

Sign In or Register to comment.