If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

picard CrosscheckReadGroupFingerprints

dporter8dporter8 Carlsbad, CAMember


In the picard CrosscheckReadGroupFingerprints command, it asks for a Haplotype_map file. What format does Haplotype_map file need to be in? If you have multiple BAM files with the same RG tag, would this command work? If not, what is the best way to edit 6000+ BAM files to have unique RG tag names?

Thank you.
Devin Porter

Issue · Github
by shlee

Issue Number
Last Updated
Closed By

Best Answers


  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    Hi @dporter8,

    I'll have to double-check what is absolutely required. What I can tell you now is what the file I have that works with the tool looks like is a sites-only VCF file. It has a header and a body.

    If this does not work for you, please let me know. I will look into more detail what is required.

  • dporter8dporter8 Carlsbad, CAMember

    Thanks for gathering this information for me. It is quite helpful. Specifically, I am trying to deconvolute 12 cell lines from one sequencing lane. I performed a single-cell RNA-seq experiment using the 10X genomics chromium system. I have SNP information on hundreds of DO mESCs and from 10X barcode system, I can get reads that come from individual cells. I filtered my BAM file down to the SNP regions of interest, then sorted out the reads into 6000 individual barcode BAM files. I already have a reference vcf file containing eQTLs on each chromosome for 200 mouse ESCs and have it in a VCF format. My plan was to correlate the SNPs from my individual barcoded vcf files with my reference file to find a matching cell line, but when I read the description of the picard fingerprint tool, I thought maybe it could be a better way of doing this.

  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    HI @dporter8, I'll ask a developer to address your question. Notice that I've made additions to my description of the haplotype map file above.

  • dporter8dporter8 Carlsbad, CAMember


    Thanks for your confirmation for utility of this tool. About creating the haplotype database.. We have Gigamuga SNP arrays on about 900 DO mESC lines. Do you know of any tools that would be able to utilize this data to create this haplotype map database? Would DOQTL work for this?


  • yfarjounyfarjoun ✭✭✭ Broad InstituteDev ✭✭✭

    I am not familiar with any tool that can genenerate a haplotype database. You should be able to LD-prune with plink ( and convert the resulting file into the Haplotype Database format using hand-written scripts.

  • yfarjounyfarjoun ✭✭✭ Broad InstituteDev ✭✭✭

    actually, I think I misunderstood your question...if the regions you are looking at do not have variants then the fingerprinting will not work. are sites in question variants or just regions with differential coverage between the samples? are we talking about one individual, or many individuals?

  • shleeshlee ✭✭✭✭✭ CambridgeMember, Broadie ✭✭✭✭✭

    I now describe the haplotype map format officially at Thanks for bringing this need to our attention @dporter8.

Sign In or Register to comment.