To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

dbSNP138 vs dbSNP141 version in GATK Bundle

jacobhsujacobhsu Hong KongMember

Hi, I have a question. Is there any specific reason why GATK Bundle only has dbSNP138 since dbSNP141 has released for a while ? Are you planning to release new bundle about this ? Do you recommend to use the latest version dbSNP141 for base recalibration ? Thanks.

Best Answer

Answers

  • vifehevifehe SpainMember

    Hi,

    I have a question related to dbSNP141. I have been running GATK pre-processing steps with reference files from the bundle (hg19 and dbSNP138). Now we were thinking on doing the calling with the new dbSNP141.

    My question is whether some incompatibilities will appear, given the different chromosome order and notation (?I've already added chr to the dbSNP141, but haven't yet reordered the chromosomes). Any advice on best practices to arrange dbSNP141 to be compatible with recal*.bam files aligned with hg19 and calibrated with dbSNP138?

    thanks

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vifehe‌

    Hi,

    It sounds like you are trying to manually liftover the original dbsnp file to the hg19 build. This is fine for the canonical chromosome contigs (1 -> X/Y), but it is not ok for for mitochondrion (chrM/MT). If you don't care about mitochondrial DNA, then it's fine. In that case all you need to do is run ReorderSAM from Picard to have the contig ordering fixed.

    -Sheila

  • vifehevifehe SpainMember

    Hi Sheila,

    thanks for your comments and suggestions on using ReorderSAM, but as far as I am aware ReorderSAM works over sam files, and dbSNP141 is VCF file.

    in the forums I've read about liftoverVCF.pl available in the public source repository, using:

    ./liftoverVCF.pl -vcf 00-All.vcf -gatk /opt/exoma/bin/GenomeAnalysisTK.jar -chain b37tohg19.chain -newRef ucsc.hg19 -oldRef human_g1k_v37 -out dbsnp141_hg19.vcf -tmp /tmp -recordOriginalLocation 0

    but gave the error:

    The provided VCF file is malformed at approximately line number 10161196: unparsable vcf record with allele B

    any hints on what may be causing the error, as I downloaded the file directly from the ncbi ftp server - ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b141_GRCh38/VCF/

    thanks again

    V

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @vifehe‌

    Hi,

    It is possible the file may have gotten errors when you tried to fix it yourself.

    You can also check the line where the error occurs, as it may also be an encoding issue (this has happened before).

    -Sheila

Sign In or Register to comment.