If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Is there a way to remove the "|" from the SQ line in a .bam file?

grtaveragrtavera Case Western Reserve UniversityMember

We have several, unique H. pylori genomes. We have aligned each of them to a reference genome, which had "|"in the fasta (or .fa) file name. Now, when running through GATK, we cannot create our final .vcf file. Do we have to remove the "|" from the .fa file and rerun everything from the beginning, or is there another approach?



  • valentinvalentin ✭✭ Cambridge, MAMember, Dev ✭✭

    Could you post the exception you are getting and extract of the header that is causing the issue? It sounds like a bug... or perhaps an obscure violation of the SAM format but I guess is the former.

  • valentinvalentin ✭✭ Cambridge, MAMember, Dev ✭✭

    It seems that a similar issue has been reported here. There you will find a workaround that you be able to adapt to your situation.

  • valentinvalentin ✭✭ Cambridge, MAMember, Dev ✭✭

    Also could you confirm what version of GATK you are using?

  • valentinvalentin ✭✭ Cambridge, MAMember, Dev ✭✭

    It seems that the that I posted above actually does not work, here is the workaround:

    It seems that something similar has been reported here.

    That might be a bug in GATK, so thanks for reporting. I guess the work around with samtools would be:

    samtools view -h input.bam | sed 's/SN:gi\|[0-9]*\|gb\|\(.*\)\|/SN:\1/' | samtools view -b - > output.bam

    You may need to add more 'sed' commands if there is SNs that follow a different regular expression. You can check on whether the
    'sed' is doing the right think like so:

    samtools view -H input.bam | sed 's/SN:gi\|[0-9]*\|gb\|\(.*\)\|/SN:\1/'

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin


    Please do have a look at the threads Valentin pointed to above. Also, why do you say you cannot create the final VCF? Are you getting an error message? If so, please post it.


Sign In or Register to comment.