Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
FastaAlternateReferenceMaker headers aren't the same as reference
I'm running FastaAlternateReferenceMaker in the following way:
java-1.7.0-u13/bin/java -jar GenomeAnalysisTK.jar -R GRCh37-lite.fa -T FastaAlternateReferenceMaker -o alt_ref1.fa --variant /RNA-seq_simulation/Illumina_body_map/HCT20170/ERS025093_5_lanes_dupsFlagged.sorted.vcf
I was expecting the fasta headers to be the same as the ones in the reference fasta (ie. the chromosome names and some extra info):
1 CM000663.1 Homo sapiens chromosome 1, GRCh37 primary reference assembly 2 CM000664.1 Homo sapiens chromosome 2, GRCh37 primary reference assembly 3 CM000665.1 Homo sapiens chromosome 3, GRCh37 primary reference assembly ... 22 CM000684.1 Homo sapiens chromosome 22, GRCh37 primary reference assembly X CM000685.1 Homo sapiens chromosome X, GRCh37 primary reference assembly Y CM000686.1 Homo sapiens chromosome Y, GRCh37 primary reference assembly, with PAR regions masked with Ns (bases 10001..2649520 & 59034050..59363566) MT J01415.2 Homo sapiens mitochondrion, complete genome GL000207.1 Homo sapiens chromosome 18 unlocalized genomic contig, GRCh37 reference primary assembly
but instead they are simply numbered from 1 to 84. Is there a way to keep the original chromosome names in the new reference file?