The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

very different SNP calling by GATK and samtools

ymwymw Posts: 13
edited February 2013 in Ask the GATK team

Hi,
I am estimating SNP number for a genome of a speceis.

  1. After mapping the illumina reads to a reference seqence (of a related species), I used picard to remove duplicates and used samtools (mpileup) to convert the bam file to a fastq file.
  2. I also use GATK to do local reglimant and recalication from the mappnig bam file. Then I used samtools (mpileup) to convert the bam files output by GATK to a fastq file.

I found that the number of SNP in the fastq going through GATK is 10 times more than the first fastq.
Interestingly, if I use picard to do duplicats-removomg again to the GATK bam and used samtools to convert the bam to fastq file. The SNP jumps back to 10 time fewer.

What can be the reason that the SNP number can be 10 time different between the two methods?
Actually, I expect the GATK output file has fewer SNP given the effect of recaliraiton or relingement. But the result is opposite.

Chih-Ming

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAPosts: 11,743 admin

    Hi Chih-Ming,

    This is not really surprising, as the GATK variant callers (UnifiedGenotyper and HaplotypeCaller) are intentionally designed to be very permissive in calling variants. The key step is Variant Quality Score Recalibration, which uses the annotation metrics to distinguish real variants from noise. The earlier processing steps such as base recalibration and alignment are used to improve the reliability of those annotations. So what you should compare is the number of variants after VQSR.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.