This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Keep "species" info from BAM to VCF
I am using HaplotypeCaller (GATK v3.5) with an input BAM file which has a header line like this (just a fake example):
@SQ SN:chr1 LN:100000 SP:Arabis thal AS:2 M5:8668a646eada2f4 UR:file:refgenome_Atha_v2.fa
But the output VCF only has a subset of this information:
Is there a way to obtain something like this instead? (i.e. also indicate species, assembly and MD5 sum)
The information in the BAM file initially comes from a "dict" file generated by Picard CreateSequenceDictionary. So I tried to feed this "dict" file with the VCF file to Picard UpdateVcfSequenceDictionary, but it didn't give me species nor mD5 sum:
Thank you in advance,