The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Missing data in HaplotypeCaller

I have an issue with HaplotypeCaller and its use to call SNPs in RAD data.

I recently used HaplotypeCaller to call SNPs in 600 samples + 50 subspecies samples. Worked fine.
After adding more data to 1200 samples + 120 subspecies samples, ~100 of these subspecies results in 0 calls and just missing data "./." at these loci. Recall that some of these samples and sites were analyzed and called in the first analysis. Any ideas why?

My setup:
~1200 Read Reduced bam files
Merged into 1 bam file with proper RG headers.
Should I not have merged? Should I not have Reduced Reads?

Thanks!

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Merging and reducing should not have any effect on the calling, so I wonder if there's something else going on. Try running HC on the unmerged files over one of the intervals where you're seeing missing calls. See if you get your calls then.

    Also, can you confirm that all of your samples were processed with the same version of GATK at every step?

  • digitonindigitonin Los AngelesMember

    Yes, I can confirm that the same version 2.7-2 was used everywhere.

    I have been able to call SNPs in all samples using a pre-BQSR non-Read-Reduced merged bam. I am now trying this on a pre-BQSR Read Reduced merged bam (since the earlier was taking too long).

    I have a feeling it has something to do with the BQSR. I will try calling SNPs post-BQSR to confirm this and will repost when I know more.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    OK, keep me posted. If you find evidence of a bug we'll need a data snippet that reproduces the error with the latest version. Good luck!

Sign In or Register to comment.