Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

dbSNP138 vs dbSNP141 version in GATK Bundle

jacobhsujacobhsu Hong KongMember

Hi, I have a question. Is there any specific reason why GATK Bundle only has dbSNP138 since dbSNP141 has released for a while ? Are you planning to release new bundle about this ? Do you recommend to use the latest version dbSNP141 for base recalibration ? Thanks.

Best Answer

Answers

  • vifehevifehe SpainMember

    Hi,

    I have a question related to dbSNP141. I have been running GATK pre-processing steps with reference files from the bundle (hg19 and dbSNP138). Now we were thinking on doing the calling with the new dbSNP141.

    My question is whether some incompatibilities will appear, given the different chromosome order and notation (?I've already added chr to the dbSNP141, but haven't yet reordered the chromosomes). Any advice on best practices to arrange dbSNP141 to be compatible with recal*.bam files aligned with hg19 and calibrated with dbSNP138?

    thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @vifehe‌

    Hi,

    It sounds like you are trying to manually liftover the original dbsnp file to the hg19 build. This is fine for the canonical chromosome contigs (1 -> X/Y), but it is not ok for for mitochondrion (chrM/MT). If you don't care about mitochondrial DNA, then it's fine. In that case all you need to do is run ReorderSAM from Picard to have the contig ordering fixed.

    -Sheila

  • vifehevifehe SpainMember

    Hi Sheila,

    thanks for your comments and suggestions on using ReorderSAM, but as far as I am aware ReorderSAM works over sam files, and dbSNP141 is VCF file.

    in the forums I've read about liftoverVCF.pl available in the public source repository, using:

    ./liftoverVCF.pl -vcf 00-All.vcf -gatk /opt/exoma/bin/GenomeAnalysisTK.jar -chain b37tohg19.chain -newRef ucsc.hg19 -oldRef human_g1k_v37 -out dbsnp141_hg19.vcf -tmp /tmp -recordOriginalLocation 0

    but gave the error:

    The provided VCF file is malformed at approximately line number 10161196: unparsable vcf record with allele B

    any hints on what may be causing the error, as I downloaded the file directly from the ncbi ftp server - ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b141_GRCh38/VCF/

    thanks again

    V

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @vifehe‌

    Hi,

    It is possible the file may have gotten errors when you tried to fix it yourself.

    You can also check the line where the error occurs, as it may also be an encoding issue (this has happened before).

    -Sheila

Sign In or Register to comment.