Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Keep "species" info from BAM to VCF

timflutretimflutre Montpellier, FranceMember

Hello,
I am using HaplotypeCaller (GATK v3.5) with an input BAM file which has a header line like this (just a fake example):

@SQ SN:chr1 LN:100000 SP:Arabis thal AS:2 M5:8668a646eada2f4 UR:file:refgenome_Atha_v2.fa

But the output VCF only has a subset of this information:

##contig=<ID=chr1,length=100000>
##reference=file:///home/me/tmp/refgenome_Atha_v2.fa

Is there a way to obtain something like this instead? (i.e. also indicate species, assembly and MD5 sum)

##contig=<ID=chr1,length=100000,assembly=2,md5=8668a646eada2f4,species="Arabis thal">

The information in the BAM file initially comes from a "dict" file generated by Picard CreateSequenceDictionary. So I tried to feed this "dict" file with the VCF file to Picard UpdateVcfSequenceDictionary, but it didn't give me species nor mD5 sum:

##contig=<ID=chr1,length=100000,assembly=2>

Thank you in advance,
Tim

Answers

Sign In or Register to comment.