The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

inconsistency between the read depth reported by HC and by samtools mpileup

wang1766wang1766 United StatesMember

Hi everyone,

I'm calling SNPs by using GATK HC. I'm also using samtools mpileup to give specific base type and read depth information for each called SNP position (using the same bam file, of course). However, I found that in about 10% of cases, in the positions where GATK called SNPs, samtools mpileup did not report any read coverages. That confused me, so I put the bam file into igv and tablet to visualize the reads. However, both of them confirmed that there were indeed no reads in those positions at all.

Here is the command I used for GATK:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R index/genome_mu50.fasta -I wt_rg.bam -o wt_raw.vcf

for samtools mpileup:

samtools mpileup -f index/genome_mu50.fasta wt_rg.bam > wt.coverage

Please see the vcf table:

Please notice that a couple of SNPs were called between 1550190 and 1550201, and reasonable allele depths were reported to back up these results.

Please see the igv snip shot of this area:

There is not a single read in this area!

I also tried UnifiedGenotyper, and those SNPs were not called. Is this a bug in HC? or I did something wrong? Any comments will be appreciated. Thanks!

Tagged:

Answers

  • wang1766wang1766 United StatesMember

    Sorry the pictures didn't show up well, plz see the attachment.

    gatk_vcf.jpg
    1260 x 270 - 150K
    igv_noreads.jpg
    1408 x 726 - 184K
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    What is happening here is most probably this. Before calling variants within a certain window (called an ActiveRegion), HaplotypeCaller does a de novo assembly step with all the reads, to correct for mapping errors. Sometimes this causes the read coverage to change because some reads are remapped in a different place (though still nearby). So it may be that after reassembly, HC finds reads in positions where there were none in the original bam file.

    Try using the -bamOut argument in this region (with -L to restrict the analysis to that region only); this will produce a bam file containing the assembled haplotypes. This should help clarify what is the basis for the HC's calls.

Sign In or Register to comment.