Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

picard CrosscheckReadGroupFingerprints

dporter8dporter8 Carlsbad, CAMember

Hello,

In the picard CrosscheckReadGroupFingerprints command, it asks for a Haplotype_map file. What format does Haplotype_map file need to be in? If you have multiple BAM files with the same RG tag, would this command work? If not, what is the best way to edit 6000+ BAM files to have unique RG tag names?

Thank you.
Devin Porter

Issue · Github
by shlee

Issue Number
783
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Best Answers

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @dporter8,

    I'll have to double-check what is absolutely required. What I can tell you now is what the file I have that works with the tool looks like is a sites-only VCF file. It has a header and a body.

    If this does not work for you, please let me know. I will look into more detail what is required.

  • dporter8dporter8 Carlsbad, CAMember

    Thanks for gathering this information for me. It is quite helpful. Specifically, I am trying to deconvolute 12 cell lines from one sequencing lane. I performed a single-cell RNA-seq experiment using the 10X genomics chromium system. I have SNP information on hundreds of DO mESCs and from 10X barcode system, I can get reads that come from individual cells. I filtered my BAM file down to the SNP regions of interest, then sorted out the reads into 6000 individual barcode BAM files. I already have a reference vcf file containing eQTLs on each chromosome for 200 mouse ESCs and have it in a VCF format. My plan was to correlate the SNPs from my individual barcoded vcf files with my reference file to find a matching cell line, but when I read the description of the picard fingerprint tool, I thought maybe it could be a better way of doing this.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    HI @dporter8, I'll ask a developer to address your question. Notice that I've made additions to my description of the haplotype map file above.

  • dporter8dporter8 Carlsbad, CAMember

    yfarjoun,

    Thanks for your confirmation for utility of this tool. About creating the haplotype database.. We have Gigamuga SNP arrays on about 900 DO mESC lines. Do you know of any tools that would be able to utilize this data to create this haplotype map database? Would DOQTL work for this?

    Thanks.
    Devin

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    I am not familiar with any tool that can genenerate a haplotype database. You should be able to LD-prune with plink (https://www.cog-genomics.org/plink2/ld) and convert the resulting file into the Haplotype Database format using hand-written scripts.

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    actually, I think I misunderstood your question...if the regions you are looking at do not have variants then the fingerprinting will not work. are sites in question variants or just regions with differential coverage between the samples? are we talking about one individual, or many individuals?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    I now describe the haplotype map format officially at http://gatkforums.broadinstitute.org/dsde/discussion/9526. Thanks for bringing this need to our attention @dporter8.

Sign In or Register to comment.