incompatible chromosome coordinates in dbsnp and reference file for UnifiedGenotyper

max_shpakmax_shpak Austin TXMember

I'm encountering a problem similar to what I experienced with a mismatch between reference and cosmic files. Specifically, I created .bam files using human_g1k_v3.fasta, which reference coordinates as 1...22,X,Y etc. When I try to run UnifiedGenotyper with this reference file, the .bam file I created, and the dbsnp_137.b37.vcf, I get an error on account of the fact that snp coordinates are listed as chr1...chr22,chrX,chrY etc. Other than writing a script to remove all occurrences of "chr" is there another way to get around this problem, i.e. a dbsnp reference file that has the desired coordinates without the "chr"?

I resolved the problem with reference/cosmic by finding a cosmic file with consistent notation, but can't find a similar fix for this one. I'd appreciate any suggestions.

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It sounds like your dbsnp file is the culprit. Our bundle provides resource files with consistently named contigs (either b37 or hg19). If you use those you should be fine.

  • max_shpakmax_shpak Austin TXMember

    Well, I removed "chr" from the contig labels using sed, and I guess it works now. In the future I should probably make sure that I'm using dbsnp files whose annotation is consistent with the reference file.

Sign In or Register to comment.