We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

picard CrosscheckReadGroupFingerprints

dporter8dporter8 Carlsbad, CAMember


In the picard CrosscheckReadGroupFingerprints command, it asks for a Haplotype_map file. What format does Haplotype_map file need to be in? If you have multiple BAM files with the same RG tag, would this command work? If not, what is the best way to edit 6000+ BAM files to have unique RG tag names?

Thank you.
Devin Porter

Issue · Github
by shlee

Issue Number
Last Updated
Closed By

Best Answers


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @dporter8,

    I'll have to double-check what is absolutely required. What I can tell you now is what the file I have that works with the tool looks like is a sites-only VCF file. It has a header and a body.

    If this does not work for you, please let me know. I will look into more detail what is required.

  • dporter8dporter8 Carlsbad, CAMember

    Thanks for gathering this information for me. It is quite helpful. Specifically, I am trying to deconvolute 12 cell lines from one sequencing lane. I performed a single-cell RNA-seq experiment using the 10X genomics chromium system. I have SNP information on hundreds of DO mESCs and from 10X barcode system, I can get reads that come from individual cells. I filtered my BAM file down to the SNP regions of interest, then sorted out the reads into 6000 individual barcode BAM files. I already have a reference vcf file containing eQTLs on each chromosome for 200 mouse ESCs and have it in a VCF format. My plan was to correlate the SNPs from my individual barcoded vcf files with my reference file to find a matching cell line, but when I read the description of the picard fingerprint tool, I thought maybe it could be a better way of doing this.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    HI @dporter8, I'll ask a developer to address your question. Notice that I've made additions to my description of the haplotype map file above.

  • dporter8dporter8 Carlsbad, CAMember


    Thanks for your confirmation for utility of this tool. About creating the haplotype database.. We have Gigamuga SNP arrays on about 900 DO mESC lines. Do you know of any tools that would be able to utilize this data to create this haplotype map database? Would DOQTL work for this?


  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    I am not familiar with any tool that can genenerate a haplotype database. You should be able to LD-prune with plink (https://www.cog-genomics.org/plink2/ld) and convert the resulting file into the Haplotype Database format using hand-written scripts.

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    actually, I think I misunderstood your question...if the regions you are looking at do not have variants then the fingerprinting will not work. are sites in question variants or just regions with differential coverage between the samples? are we talking about one individual, or many individuals?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    I now describe the haplotype map format officially at http://gatkforums.broadinstitute.org/dsde/discussion/9526. Thanks for bringing this need to our attention @dporter8.

Sign In or Register to comment.